Calculating EMA from SMA in python - python

I've defined a function that calculates SMA(Simple Moving average), compares the value to the closing price 'p' and gives a signal as to whether we should buy or sell, represented by a 1 or -1 respectively.
I now want to create a function for EMA(Exponential Moving Average). The formula for EMA is EMA = Closing price x multiplier + EMA (previous day) x (1-multiplier) where the multiplier is [2 รท (number of observations + 1)].
However, I'm not sure how to do this in Spyder/Python.
My code for SMA is:
def SMA(p,window=10,signal_type='buy only'):
#input price "p", look-back window "window",
#signal type = buy only (default) --gives long signals, sell only --gives sell signals, both --gives both long and short signals
#return a list of signals = 1 for long position and -1 for short position
signals = np.zeros(len(p))
if len(p)<window:
#no signal if no sufficient data
return signals
sma = list(np.zeros(window)+np.nan) #the first few prices does not give technical indicator values
sma += [np.average(p[k:k+window]) for k in np.arange(len(p)-window)]
for i in np.arange(len(p)-1):
if np.isnan(sma[i]):
continue #skip the open market time window
if sma[i]<p[i] and (signal_type=='buy only' or signal_type=='both'):
signals[i] = 1
elif sma[i]>p[i] and (signal_type=='sell only' or signal_type=='both'):
signals[i] = -1
return signals
This gets called in another file like so, I want to be able to just change 'backtest.SMA' to 'backtest.EMA' so I can see the signals that EMA produces.
signals = backtest.SMA(this_dat['Close'].values, window=10)
('this_dat['Close'].values' is a list of closing price values taken from a table)
I defined EMA in the same way I defined SMA but I'm not sure how to obtain EMA
def EMA(p,window=10,signal_type='buy only'):
signals = np.zeros(len(p))
if len(p)<window:
#no signal if no sufficient data
return signals
multiplier=(2/(len(p)+1))
ema = list(np.zeros(window)+np.nan)
#ema = Closing price x multiplier + EMA (previous day) x (1-multiplier)#
ema +=
for i in np.arange(len(p)-1):
if np.isnan(ema[i]):
continue #skip the open market time window
if ema[i]<p[i] and (signal_type=='buy only' or signal_type=='both'):
signals[i] = 1
elif ema[i]>p[i] and (signal_type=='sell only' or signal_type=='both'):
signals[i] = -1
return signals
I need help finding a way to write that forumla for EMA in python, I can't figure it out. If anyone knows a package I can use that might work then I'm also open to trying it but I would prefer if I could find EMA using my SMA function as a template.
Any help would be greatly appreciated, thanks

Related

Define the reward function to minimize costs

I have a problem with my reinforcement learning model. I am trying to simulate an electric battery storage.
The battery charges when the electricity prices are low and discharge ONLY to the user at fixed hours during the day, every day.
Therefore, the only cost for the user is power of charge * electricity price at the hour.
The reward function is set as the opposite of the cumulative sum of the cost.
Is it a correct approach? How to properly define it so that the overall cost of the purchased electricity is at minimum at the end of the year?
The problem that I have is that the battery will always near the maximum capacity and never fully take advantage of the full range of MWh available.
1. Define a dataframe where to store fictitious electricity prices for 365 days
df=pd.DataFrame(np.random.randint(0,500,size=(24, 365)))
2. Define the main parameters
Lookback_window_size=7
Current_day=Lookback_window_size
P_charge=2 #MW
P_discharge=3 #MW
3. Define the class Battery(Env)
class Battery(Env):
metadata = {'render.modes': ['human']}
def __init__(self, df):
#Import the dataframe
self.df = df
# The action space is a 1D array of shape (24,). Since we are simulating day-ahead market, the action space returns
# the overall daily charge / no charge scenario
# action = 1 means that we charge our battery, action = 0 means that we don't charge
self.action_space= spaces.MultiBinary(24)
# The observation space is a 1D array. Given a lookback window size of 1 day, then The first 48 columns represent
# the electricity prices for the current day + all the days before included in the lookback window size.
# The last two columns store SOC (state of charge) at the end of the day and overall cost
# (how much we paid for electricity).
self.observation_shape=(int((Lookback_window_size+1)*24+2),)
self.observation_space = spaces.Box(low = 0, high=np.inf, shape=self.observation_shape, dtype=np.float64)
def _next_observation(self):
# Add the prices of the last days to the monitor matrix
prices=[]
for i in range(self.Current_day - Lookback_window_size,self.Current_day + 1):
prices=np.concatenate([prices,self.df.iloc[0:,i].values])
# Add extra values to monitor such as SOC, cost and day of the week (Monday=1,Tuesday=2,etc.)
extra = [self.SOC, self.Cost]
obs=np.concatenate([prices,extra])
return obs
def _take_action(self, action):
# Being the action space an array, the for loop will check the action at every hour (action[i]) and update the
# cost and the state of charge
self.capacity=200 #MWh
i=0
for x in action:
#When action = 1 then we charge our battery, if action = 0 then we don't charge
if x == 1:
# The cost increase based on the price of the electricity at that hour
self.Cost+=self.df[self.Current_day][i]*P_charge
# If we charge, then the state of charge (SOC) increases as well
self.SOC+=P_charge
# Everyday we discharge the battery always at the same hours
if (i in range(8,14)):
self.SOC-=P_discharge
# if the battery is depleted, then we directly buy electricity from the grid
if self.SOC<0:
self.Cost+=self.df[self.Current_day][i+1]*(-self.SOC)
self.SOC=0
#the battery cannot charge above the capacity threshold.
if self.capacity is not None:
if self.SOC > self.capacity:
# We subtract the latest cost. Since it could not have happened being the SOC above the maximum.
self.Cost-=self.df[self.Current_day][i]*P_charge
# The capacity needs to be set to the threshold
self.SOC = min(self.SOC, self.capacity)
i+=1
def step(self, action):
# Execute one time step within the environment
self._take_action(action)
self.Current_day += 1
# Maximizing the reward means to minimize the costs
reward = - self.Cost
# Stop at the end of the dataframe
done = self.Current_day >= len(self.df.columns)-1
obs = self._next_observation()
return obs, reward, done, {}
def render(self, mode='human', close=False):
print(f'Day: {self.Current_day}')
print(f'SOC: {self.SOC}')
print(f'Cost: {self.Cost}')
print(f'Actions: {action}')
def reset(self):
self.Current_day = Lookback_window_size
# Give an initial SOC value
self.SOC = 50
# Cost at day 0 is null
self.Cost = 0
return self._next_observation()

RSI in spyder using data in excel

So I have an excel file containing data on a specific stock.
My excel file contains about 2 months of data, it monitors the Open price, Close price, High Price, Low Price and Volume of trades in 5 minute intervals, so there are about 3000 rows in my file.
I want to calculate the RSI (or EMA if it's easier) of a stock daily, I'm making a summary table that collects the daily data so it converts my table of 3000+ rows into a table with only about 60 rows (each row represents one day).
Essentially I want some sort of code that sorts the excel data by date then calculates the RSI as a single value for that day. RSI is given by: 100-(100/(1+RS)) where RS = average gain of up periods/average loss of down periods.
Note: My excel uses 'Datetime' so each row's 'Datetime' looks something like '2022-03-03 9:30-5:00' and the next row would be '2022-03-03 9:35-5:00', etc. So the code needs to just look at the date and ignore the time I guess.
Some code to maybe help understand what I'm looking for:
So here I'm calling my excel file, I want the code to take the called excel file, group data by date and then calculate the RSI of each day using the formula I wrote above.
dat = pd.read_csv('AMD_5m.csv',index_col='Datetime',parse_dates=['Datetime'],
date_parser=lambda x: pd.to_datetime(x, utc=True))
dates = backtest.get_dates(dat.index)
#create a summary table
cols = ['Num. Obs.', 'Num. Trade', 'PnL', 'Win. Ratio','RSI'] #add addtional fields if necessary
summary_table = pd.DataFrame(index = dates, columns=cols)
# loop backtest by dates
This is the code I used to fill out the other columns in my summary table, I'll put my SMA (simple moving average) function below.
for d in dates:
this_dat = dat.loc[dat.index.date==d]
#find the number of observations in date d
summary_table.loc[d]['Num. Obs.'] = this_dat.shape[0]
#get trading (i.e. position holding) signals
signals = backtest.SMA(this_dat['Close'].values, window=10)
#find the number of trades in date d
summary_table.loc[d]['Num. Trade'] = np.sum(np.diff(signals)==1)
#find PnLs for 100 shares
shares = 100
PnL = -shares*np.sum(this_dat['Close'].values[1:]*np.diff(signals))
if np.sum(np.diff(signals))>0:
#close position at market close
PnL += shares*this_dat['Close'].values[-1]
summary_table.loc[d]['PnL'] = PnL
#find the win ratio
ind_in = np.where(np.diff(signals)==1)[0]+1
ind_out = np.where(np.diff(signals)==-1)[0]+1
num_win = np.sum((this_dat['Close'].values[ind_out]-this_dat['Close'].values[ind_in])>0)
if summary_table.loc[d]['Num. Trade']!=0:
summary_table.loc[d]['Win. Ratio'] = 1. *num_win/summary_table.loc[d]['Num. Trade']
This is my function for calculating Simple Moving Average. I was told to try and adapt this for RSI or for EMA (Exponential Moving Average). Apparently adapting this for EMA isn't too troublesome but I can't figure it out.
def SMA(p,window=10,signal_type='buy only'):
#input price "p", look-back window "window",
#signal type = buy only (default) --gives long signals, sell only --gives sell signals, both --gives both long and short signals
#return a list of signals = 1 for long position and -1 for short position
signals = np.zeros(len(p))
if len(p)<window:
#no signal if no sufficient data
return signals
sma = list(np.zeros(window)+np.nan) #the first few prices does not give technical indicator values
sma += [np.average(p[k:k+window]) for k in np.arange(len(p)-window)]
for i in np.arange(len(p)-1):
if np.isnan(sma[i]):
continue #skip the open market time window
if sma[i]<p[i] and (signal_type=='buy only' or signal_type=='both'):
signals[i] = 1
elif sma[i]>p[i] and (signal_type=='sell only' or signal_type=='both'):
signals[i] = -1
return signals
I have two solutions to this. One is to loop through each group, then add the relevant data to the summary_table, the other is to calculate the whole series and set the RSI column as this.
I first recreated the data:
import yfinance
import pandas as pd
# initially created similar data through yfinance,
# then copied this to Excel and changed the Datetime column to match yours.
df = yfinance.download("AAPL", period="60d", interval="5m")
# copied it and read it as a dataframe
df = pd.read_clipboard(sep=r'\s{2,}', engine="python")
df.head()
# Datetime Open High Low Close Adj Close Volume
#0 2022-03-03 09:30-05:00 168.470001 168.910004 167.970001 168.199905 168.199905 5374241
#1 2022-03-03 09:35-05:00 168.199997 168.289993 167.550003 168.129898 168.129898 1936734
#2 2022-03-03 09:40-05:00 168.119995 168.250000 167.740005 167.770004 167.770004 1198687
#3 2022-03-03 09:45-05:00 167.770004 168.339996 167.589996 167.718094 167.718094 2128957
#4 2022-03-03 09:50-05:00 167.729996 167.970001 167.619995 167.710007 167.710007 968410
Then I formatted the data and created the summary_table:
df["date"] = pd.to_datetime(df["Datetime"].str[:16], format="%Y-%m-%d %H:%M").dt.date
# calculate percentage change from open and close of each row
df["gain"] = (df["Close"] / df["Open"]) - 1
# your summary table, slightly changing the index to use the dates above
cols = ['Num. Obs.', 'Num. Trade', 'PnL', 'Win. Ratio','RSI'] #add addtional fields if necessary
summary_table = pd.DataFrame(index=df["date"].unique(), columns=cols)
Option 1:
# loop through each group, calculate the average gain and loss, then RSI
for grp, data in df.groupby("date"):
# average gain for gain greater than 0
average_gain = data[data["gain"] > 0]["gain"].mean()
# average loss for gain less than 0
average_loss = data[data["gain"] < 0]["gain"].mean()
# add to relevant cell of summary_table
summary_table["RSI"].loc[grp] = 100 - (100 / (1 + (average_gain / average_loss)))
Option 2:
# define a function to apply in the groupby
def rsi_calc(series):
avg_gain = series[series > 0].mean()
avg_loss = series[series < 0].mean()
return 100 - (100 / (1 + (avg_gain / avg_loss)))
summary_table["RSI"] = df.groupby("date")["gain"].apply(lambda x: rsi_calc(x))
Output (same for each):
summary_table.head()
# Num. Obs. Num. Trade PnL Win. Ratio RSI
#2022-03-03 NaN NaN NaN NaN -981.214015
#2022-03-04 NaN NaN NaN NaN 501.950956
#2022-03-07 NaN NaN NaN NaN -228.379066
#2022-03-08 NaN NaN NaN NaN -2304.451654
#2022-03-09 NaN NaN NaN NaN -689.824739

How to use min/max with shift function in pandas?

I have timeseries data (euro/usd).
I want to create new column with conditions that
(It easier to read in my code to understand the conditions.)
if minimum of 3 previous high prices less than or equal to the current price then it will be 'BUY_SIGNAL' and if maximum of 3 previous low prices higher than or equal to the current price then it will be 'SELL_SIGNAL'.
Here is my table looks like
DATE OPEN HIGH LOW CLOSE
0 1990.09.28 1.25260 1.25430 1.24680 1.24890
1 1990.10.01 1.25170 1.26500 1.25170 1.25480
2 1990.10.02 1.25520 1.26390 1.25240 1.26330
3 1990.10.03 1.26350 1.27000 1.26030 1.26840
4 1990.10.04 1.26810 1.27750 1.26710 1.27590
and this is my code (I try to create 2 functions and it does not work)
def target_label(df):
if df['HIGH']>=[df['HIGH'].shift(1),df['HIGH'].shift(2),df['HIGH'].shift(3)].min(axis=1):
return 'BUY_SIGNAL'
if df['LOW']>=[df['LOW'].shift(1),df['LOW'].shift(2),df['LOW'].shift(3)].min(axis=1):
return 'SELL_SIGNAL'
else:
return 'NO_SIGNAL'
def target_label(df):
if df['HIGH']>=df[['HIGH1','HIGH2','HIGH3'].min(axis=1):
return 'BUY_SIGNAL'
if df['LOW']<=df[['LOW1','LOW2','LOW3']].max(axis=1):
return 'SELL_SIGNAL'
else:
return 'NO_SIGNAL'
d_df.apply (lambda df: target_label(df), axis=1)
You can use rolling(3).min() to get the minimum of previous 3 rows. The same would work for other functions like max, mean, etc. Something like the following:
df['signal'] = np.where(
df['HIGH'] >= df.shift(1).rolling(3)['HIGH'].min(), 'BUY_SIGNAL',
np.where(
df['LOW'] >= df.shift(1).rolling(3)['LOW'].min(), 'SELL_SIGNAL',
'NO_SIGNAL'
)
)

creating signals based on current and prior time periods

I'm trying to write a trading algo and I am very new to python.
Lots of things are easy to understand but I get lost easily. I have a strategy I want to use, but the coding is getting in the way.
I want to create two moving averages and when they cross I want that to be a signal.
The part im I am currently struggling with is also including information about the prior period.
When
MovingAverage1( last 10 candles ) == MovingAverage2( Last 20 candles ),
that's a signal,
but is it a buy or sell?
When
MovingAVerage1( last 10 candles after skipping most recent ) > MovingAverage2( last 10 candles after skipping most recent )
then sell.
Here is what I've got so far, where the MA-s I am using are being simplified for this question:
class MyMACrossStrategy (Strategy):
"""
Requires:
symbol - A stock symbol on which to form a strategy on.
bars - A DataFrame of bars for the above symbol.
short_window - Lookback period for short moving average.
long_window - Lookback period for long moving average."""
def __init__(self, symbol, bars, short_window=4, long_window=9):
self.symbol = symbol
self.bars = bars
self.short_window = short_window
self.long_window = long_window
# Function Helper for indicators
def fill_for_noncomputable_vals(input_data, result_data):
non_computable_values = np.repeat(
np.nan, len(input_data) - len(result_data)
)
filled_result_data = np.append(non_computable_values, result_data)
return filled_result_data
def simple_moving_average(data, period):
"""
Simple Moving Average.
Formula:
SUM(data / N)
"""
catch_errors.check_for_period_error(data, period)
# Mean of Empty Slice RuntimeWarning doesn't affect output so it is
# supressed
with warnings.catch_warnings():
warnings.simplefilter("ignore", category=RuntimeWarning)
sma = [np.mean(data[idx-(period-1):idx+1]) for idx in range(0, len(data))]
sma = fill_for_noncomputable_vals(data, sma)
return sma
def hull_moving_average(data, period):
"""
Hull Moving Average.
Formula:
HMA = WMA(2*WMA(n/2) - WMA(n)), sqrt(n)
"""
catch_errors.check_for_period_error(data, period)
hma = wma(
2 * wma(data, int(period/2)) - wma(data, period), int(np.sqrt(period))
)
return hma
def generate_signals(self):
"""Returns the DataFrame of symbols containing the signals
to go long, short or hold (1, -1 or 0)."""
signals = pd.DataFrame(index=self.bars.index)
signals['signal'] = 0.0
# Create the set of moving averages over the
# respective periods
signals['Fast_Line'] = sma(bars['Close'], self.short_window)
signals['Slow_line'] = hma(bars['Close'], self.long_window)
signals1['Fast_Line'] = sma(bars['Close'], self.short_window[-1])
signals1['Slow_line'] = hma(bars['Close'], self.long_window[-1])
# Create a 'signal' (invested or not invested) when the short moving average crosses the long
# moving average, but only for the period greater than the shortest moving average window
signals['signal'][self.short_window:] = np.where(signals['Fast_Line'][self.short_window:]
> signals['Slow_line'][self.short_window:], 1.0, 0.0)
# Take the difference of the signals in order to generate actual trading orders
signals['positions'] = signals['signal'].diff()
if signals['Fast_Line'] = signals['Slow_Line'] and ...
return signals
Hopefully my question makes sense.
I am assuming that you want to test your strategy first before using it in live market. You can download the stock data from yahoo finance in csv format. And you can upload with below code:
import pandas as pd
import numpy as np
data = pd.read_csv('MSFT.csv')
once the data is stored in the pandas dataframe data, you can moving average of the Closing price with following code:
if you are planning the crossover strategy
sma_days=20
lma_days=50
data['SMA_20']=data['Close'].rolling(window=sma_days,center=False).mean()
data['SMA_50']=data['Close'].rolling(window=lma_days,center=False).mean()
data['SIGNAL']=np.where(data['SMA_20']>data['SMA_50'],'BUY','SELL')

Python - path dependent simulation

I've setup a simulation example below.
Setup:
I have weekly data, say 6 years of data each week of around 1000 stocks some weeks more other weeks less than 1000. I randomly chose 75 stocks at time t0. At t1 some stocks dies (probability p, goes out of fashion) or leave the index (structural such as merging). I need to simulate stocks so that every week I've exactly 75 stocks. Every week some stocks dies (between 0 and 75) and I pick new ones not from the existing 75. I also check if the stock leaves do to structural reasons. Every week I calculate the returns of the 75 stocks.
Questions: Is there an obvious why to improve the speed. I started with Pandas objects (group sort) which was to slow. I haven't tried to parallel the loop. I'm more interesting to hear if I should use numba (but it doesn't have the np.in1d function) or if there is a faster way to shuffle (I actually only need to shuffle the ones). I've also think about creating a fixed array with all stocks id using NaN, the problem here is that I need 75 names so I still need to filter out these NaN every week.
Maybe this is to detailed problem for this forum, I apologize if that's the case
Code:
from timeit import default_timer
import numpy as np
# Create dataset
n_weeks = 312 # Approximately 6 years of weekly data
n_stocks = np.random.normal(1000, 5, n_weeks).astype(dtype=np.uint16) # Around 1000 stocks every week but not fixed
idx_new_week = np.cumsum(np.hstack((0, n_stocks)))
# We give each stock a stock idea
n_obs = n_stocks.sum()
stock_id = np.ones([n_obs], dtype=np.uint16)
for j in range(1, n_weeks+1):
stock_id[idx_new_week[j-1]:idx_new_week[j]] = np.cumsum(np.ones(n_stocks[j-1]))
stock_rtn = np.random.normal(0, 0.25/np.sqrt(52), n_obs) # Simulated forward (one week ahead) return for each stock
# Simulation part
# Week 0 pick randomly 75 stocks
# Week n >=1 a stock dies for two reasons
# 1) randomness (probability 'p')
# 2) structural event (could be merger, fall out of index).
# We cannot assume that it is always the high stockid which dies for structural reasons (as it looks like here)
# If a stock dies we randomely pick a stock from the "deak" stock dataset (not included the ones which dies this week)
n_sim = 100 # I want this to be 1 mill
n_stock_cand = 75 # For this example we pick 75 stocks
p_survial = 0.90
# The weekly periodcal returns
pf_rtn = np.zeros([n_weeks, n_sim])
start = default_timer()
for k in range(0, n_sim):
# Randomely choice n_stock_cand at time zero
boolean_list = np.array([False] * (n_stocks[0] - n_stock_cand) + [True] * n_stock_cand)
np.random.shuffle(boolean_list) # Shuffle the list
stock_id_this_week = stock_id[idx_new_week[0]:idx_new_week[1]][boolean_list]
stock_rtn_this_week = stock_rtn[idx_new_week[0]:idx_new_week[1]][boolean_list]
# This part only simulate the Buzz portfolio names - later we simulate returns and from specific holdings of the 75 names
for j in range(1, n_weeks):
pf_rtn[j-1, k] = stock_rtn_this_week.mean()
# Find the number of stocks to keep
boolean_keep_stocks = np.random.rand(n_stock_cand) < p_survial
# Next we need to check if a stock is still part of the universe next period
stock_cand_temp = stock_id[idx_new_week[j-1]:idx_new_week[j]]
stock_rtn_temp = stock_rtn[idx_new_week[j-1]:idx_new_week[j]]
boolean_keep_stocks = (boolean_keep_stocks) & (np.in1d(stock_id_this_week, stock_cand_temp, assume_unique=True))
n_stocks_to_replace = n_stock_cand - boolean_keep_stocks.sum() # Number of new stocks to pick this week
if n_stocks_to_replace > 0:
# We have to pick from stocks which is not part of the portfolio already
boolean_cand = np.in1d(stock_cand_temp, stock_id_this_week, assume_unique=True, invert=True)
n_stocks_to_pick_from = boolean_cand.sum()
boolean_list = np.array([False] * (n_stocks_to_pick_from - n_stocks_to_replace) + [True] * n_stocks_to_replace)
np.random.shuffle(boolean_list) # Shuffle the list
# First avoid picking the same stock twich, next pick from the unique candidate list
stock_id_new = stock_cand_temp[boolean_cand][boolean_list] # The new stocks
stock_rtn_new = stock_rtn_temp[boolean_cand][boolean_list] # and their returns
stock_id_this_week = np.hstack((stock_id_this_week[boolean_keep_stocks], stock_id_new))
stock_rtn_this_week = np.hstack((stock_rtn_this_week[boolean_keep_stocks], stock_rtn_new))
else:
# No replacement of stocks / all surview but order might differ
boolean_cand = np.in1d(stock_cand_temp, stock_id_this_week, assume_unique=True, invert=False)
stock_id_this_week = stock_cand_temp[boolean_cand]
stock_rtn_this_week = stock_rtn_temp[boolean_cand]
# PnL last period
pf_rtn[n_weeks-1, k] = stock_rtn_this_week.mean()
print(default_timer() - start)

Categories