Is it possible to apply k-fold on fitting the model and later on predicting?
We have a structure built like the following, where data is trained upon time-series data between 2020-02-01 and 2021-02-01, and then tested on time-series data between 2021-04-01 and 2021-04-08, but unfortunately the results for predicting is very bad, and when we calculate the evaluation metrics, it is obvious something is wrong. Since we only have one year of data to train on, we might believe we are underfitting, and therefore we believe k-fold would help us.
What we want exactly is to k-fold on the period 2020-02-01 to 2021-02-01 and then afterwards predict on the period 2021-04-01 to 2021-04-08. By doing this, we efficiently train the model with the 1 year of data we have, and ensure a model fit to do a proper prediction. Is this possible, and perhaps an even better question to ask, is this the correct way of predicting data?
Our data looks like the following, where datetime is in %y%m%d format
datetime temperature solar humidity rain temp_60
2020-02-01 00:00:00, 4.32, 22.84, 82.12, 16.36, 3.12
2020-02-01 00:10:00, 4.38, 21.99, 82.11, 16.25, 3.11
...
2021-11-01 00:00:00, 9.94, 15.43, 82.29, 14.83, 3.11
Code:
TargetVariable = ['temp_60']
predictors = ['temperature', 'humidity', 'rain', 'solar']
columnAll = ['datetime', 'temperature', 'humidity', 'rain', 'solar', 'temp_60', 'Predicted_temp_60', 'APE']
feature_selection_list = []
if __name__ == '__main__':
# Setting random seed for the initial starting weights
rndseed = 7
seed(rndseed)
np.random.seed(rndseed)
random.seed(rndseed)
# Retrieve data from csv file
dataframe = pd.read_csv("../data.csv")
Algorithm(dataframe=dataframe)
def Algorithm(dataframe):
# Split data
dateToTrain = dataframe.loc[dataframe['datetime'].between('2020-02-01', '2021-02-01', inclusive=True)]
dateToPredict = dataframe.loc[dataframe['datetime'].between('2021-04-01', '2021-04-08', inclusive=True)]
# Initialize the pipeline
estimator = pipelining()
# Get values of the predictors and target variable for training data
# For the datetime between 2020-02-01 and 2021-02-01
X_train_year, y_train_year = dateToTrain[predictors].values, dateToTrain[TargetVariable].values
# Get values of the predictors and target variable for testing data
# For the datetime between 2021-04-01 and 2021-04-08
X_test_all, y_test_all = dateToPredict[predictors].values, dateToPredict[TargetVariable].values
# ...
ann = ArtificialNeuralNetwork(predictors=predictors, pipeline=estimator, X_train=X_train_year, y_train=y_train_year)
annPredictionDataframe = ann.modelFitAndPredict(X_test=X_test_all, y_test=y_test_all)
# ...
annPredictionDataframe['datetime'] = dateToPredict['datetime'].values
kfold_score = ann.evaluateScoreKFold()
annPredictionDataframe = annPredictionDataframe[columnAll]
# Evaluation of the performance of the Artificial Neural Network (ANN)
evaluation = performanceEvaluation(y_test_orig=annPredictionDataframe['temp_60'], y_test_pred=annPredictionDataframe['Predicted_temp_60'])
evaluation["Generalization Error"] = kfold_score.mean()
def pipelining():
# Standardizing the features
estimators = [('standardize', StandardScaler())]
estimators.append(('mlp', KerasRegressor(build_fn=make_regression_ann, batch_size=10, epochs=100)))
# The pipeline can be used as any other estimator
# and avoids leaking the test set into the train set
pipeline = Pipeline(estimators)
return pipeline
def make_regression_ann(initializer='uniform', activation='relu', optimizer='rmsprop', loss='mae', neurons=12):
inputDim = len(predictors) if selection == 1 or selection == 3 or selection == 4 else len(tmp)
# create ANN model
model = Sequential()
# Defining the Input layer and FIRST hidden layer, both are same!
# model.add(Dense(units=neurons, input_dim=len(feature_selection_list), kernel_initializer=initializer, activation=activation, activity_regularizer=l1(0.0001)))
model.add(Dense(units=neurons, input_dim=inputDim, kernel_initializer=initializer, activation=activation, activity_regularizer=l1(0.0001)))
# Defining the Second layer of the model
# after the first layer we don't have to specify input_dim as keras configure it automatically
model.add(Dense(units=neurons, kernel_initializer=initializer, activation=activation))
# The output neuron is a single fully connected node
# Since we will be predicting a single number
model.add(Dense(1, kernel_initializer=initializer))
# Compiling the model
model.compile(loss=loss, optimizer=optimizer)
return model
def performanceEvaluation(y_test_orig, y_test_pred):
evaluationColumns = ['Coefficient of Determination (R2)', 'Root Mean Square Error (RMSE)', 'Mean Squared Error (MSE)', 'Mean Absolute Percent Error (MAPE)', 'Mean Absolute Error (MAE)', 'Mean Bias Error (MBE)']
# Computing the Mean Absolute Percent Error
MAPE = mean_absolute_percentage_error(y_test_orig, y_test_pred)
# Computing R2 Score
r2 = r2_score(y_test_orig, y_test_pred)
# Computing Mean Square Error (MSE)
MSE = mean_squared_error(y_test_orig, y_test_pred)
# Computing Root Mean Square Error (RMSE)
RMSE = mean_squared_error(y_test_orig, y_test_pred, squared=False)
# Computing Mean Absolute Error (MAE)
MAE = mean_absolute_error(y_test_orig, y_test_pred)
# Computing Mean Bias Error (MBE)
MBE = np.mean(y_test_pred - y_test_orig) # here we calculate MBE
eval_list = [r2, RMSE, MSE, MAPE, MAE, MBE]
dataframe = pd.DataFrame([eval_list], columns=evaluationColumns)
return dataframe
In a seperate file. This is where we do the fit and prediction:
class ArtificialNeuralNetwork:
def __init__(self, predictors, pipeline, X_train, y_train):
self.predictors = predictors
self.pipeline = pipeline
self.X_train = X_train
self.y_train = y_train
def evaluateScoreKFold(self):
cv = KFold(n_splits=10)
results = cross_val_score(self.pipeline, X=self.X_train, y=self.y_train, cv=cv, scoring="neg_mean_absolute_error")
print(f'Cross Validation Results: {results}')
print("Standardized: % .2f( % .2f) MAE" % (results.mean(), results.std()))
return results
def modelFitAndPredict(self, X_test, y_test):
# THIS IS WHERE WE SHOULD PROBABABLY DO KFOLD WITH THE ONE YEAR DATA TO TRAIN UPON THIS PIPELINE. IS THIS CORRECTLY ASSUMED???
self.pipeline.fit(self.X_train, self.y_train)
# Generating Predictions on testing data
Predictions = self.pipeline.predict(X_test)
TestingData = pd.DataFrame(data=X_test, columns=self.predictors)
TestingData['temp_60'] = y_test
TestingData['Predicted_temp_60'] = Predictions
TestingData.head()
# Computing the absolute percent error
APE = 100 * (abs(TestingData['temp_60'] - TestingData['Predicted_temp_60']) / TestingData['temp_60'])
TestingData['APE'] = APE
# Rounding all floats to 2 decimals
TestingData = TestingData.round(2)
TestingData.to_csv("TestingData.csv")
return TestingData
The Evaluation Metrics:
Generalization Error R2 RMSE MSE MAPE MAE MBE
0.921 -327.534 6.945 48.229 0.811 5.212 1.835
The purpose of cross-validation (K-fold) is model checking, not model building.
Once you have checked with cross-validation that you obtain similar metrics for every split, you have to train your model with all your training data.
Maybe, what you are looking for is ensemble methods. But I'm not sure if they can be applied with NN. A clear example is Random forest: many decision trees algorithm are trained and evaluated.
Finally, just comment that with NN, in general, as more data you have, better knowledge you obtain, so maybe splitting data to obtain different models is not the best option.
Related
I have Encoder-Decoder LSTM model that learns to predict 12 months data in advance, while looking back 12 months. If it helps at all, my dataset has around 10 years in total (120 months). I keep 8 years for training/validation, and 2 years for testing. My understanding is that my model does not have access to the testing data at the training time.
The puzzling thing is that my model predictions are simply a shift of previous points. But how did my model know the actual previous points at the time of prediction? I did not give the monthly values in the testing set to the model! If we say that it simply copies the previous point which you give as input, then I am saying that I am giving it 12 months with completely different values than the ones it predicts (so it does not copy the 12 months I am giving), but the forecasted values are shifts of actual ones (which have never been seen).
Below is an example:
My code source is from here:
Below is my code:
#train/test splitting
split_position=int(len(scaled_data)*0.8)# 8 years for training
train=scaled_data[0:split_position]
test=scaled_data[split_position:]
#print(train)
print('length of train=',len(train))
#print(test)
print('length of test=',len(test))
# split train and test data into yearly train/test sets (3d)[observation,year, month]
def split_data_yearly(train, test):
# restructure into windows of yearly data
train = array(split(train, len(train)/12))
test = array(split(test, len(test)/12))
return train, test
# evaluate one or more yearly forecasts against expected values
def evaluate_forecasts(actual, predicted):
scores = list()
# calculate an RMSE score for each day
for i in range(actual.shape[1]):
# calculate mse
mse = mean_squared_error(actual[:, i], predicted[:, i])
# calculate rmse
rmse = math.sqrt(mse)
# store
scores.append(rmse)
# calculate overall RMSE
s = 0
for row in range(actual.shape[0]):
for col in range(actual.shape[1]):
s += (actual[row, col] - predicted[row, col])**2
score = math.sqrt(s / (actual.shape[0] * actual.shape[1]))
################plot prediction vs actual###############################
predicted=predicted.reshape(predicted.shape[0],predicted.shape[1])
jump=12
inv_scores = list()
for i in range(len(predicted)):
sample_predicted = predicted[i,:]
sample_actual=actual[i,:]
#inverse normalization
sample_predicted_inv= scaler.inverse_transform(sample_predicted.reshape(-1, 1))
sample_actual_inv= scaler.inverse_transform(sample_actual.reshape(-1, 1))
#print(sample_actual_inv)
#print(data_sd[(split_position+(i*jump)-1):(split_position+(i*jump-1))+len(sample_actual_inv)])
#inverse differencing
s=numpy.array(smoothed).reshape(-1,1)
sample_actual_inv=sample_actual_inv+s[(split_position+(i*jump)):(split_position+(i*jump))+len(sample_actual_inv)]
sample_predicted_inv=sample_predicted_inv+s[(split_position+(i*jump)):(split_position+(i*jump))+len(sample_actual_inv)]
months=['August-'+str(19+i),'September-'+str(19+i),'October-'+str(19+i),'November-'+str(19+i),'December-'+str(19+i),'January-'+str(20+i),'February-'+str(20+i),'March-'+str(20+i),'April-'+str(20+i),'May-'+str(20+i),'June-'+str(20+i),'July-'+str(20+i)]
pyplot.plot( months,sample_actual_inv,'b-',label='Actual')
pyplot.plot(months,sample_predicted_inv,'--', color="orange",label='Predicted')
pyplot.legend()
pyplot.xticks(rotation=25)
pyplot.title('Encoder Decoder LSTM Prediction', y=1.08)
pyplot.show()
################### determine RMSE after inversion ################################
mse = mean_squared_error(sample_actual_inv, sample_predicted_inv)
rmse = math.sqrt(mse)
inv_scores.append(rmse)
return score, scores,inv_scores
# summarize scores
def summarize_scores(name, score, scores):
s_scores = ', '.join(['%.1f' % s for s in scores])
print('%s: [%.3f] %s' % (name, score, s_scores))
# convert history into inputs and outputs
def to_supervised(train, n_input, n_out=12):
# flatten data
data = train.reshape((train.shape[0]*train.shape[1], train.shape[2]))
X, y = list(), list()
in_start = 0
# step over the entire history one time step at a time
for _ in range(len(data)):
# define the end of the input sequence
in_end = in_start + n_input
out_end = in_end + n_out
# ensure we have enough data for this instance
if out_end <= len(data):
X.append(data[in_start:in_end, :])
y.append(data[in_end:out_end, 0])
# move along one time step
in_start += 1
return array(X), array(y)
# train the model
def build_model(train, n_input):
# prepare data
train_x, train_y = to_supervised(train, n_input)
#take portion for validation
val_size=12;
test_x,test_y=train_x[-val_size:], train_y[-val_size:]
train_x,train_y=train_x[0:-val_size],train_y[0:-val_size]
# define parameters
verbose, epochs, batch_size = 1,25, 8
n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
# reshape output into [samples, timesteps, features]
train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
# define model
model = Sequential()
model.add(LSTM(64, activation='relu', input_shape=(n_timesteps, n_features)))
model.add(RepeatVector(n_outputs))
model.add(LSTM(64, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(100, activation='relu')))
model.add(TimeDistributed(Dense(1)))
#sgd = optimizers.SGD(lr=0.004, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mse', optimizer='adam')
# fit network
train_history= model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, validation_data=(test_x, test_y),verbose=verbose)
loss = train_history.history['loss']
val_loss = train_history.history['val_loss']
pyplot.plot(loss)
pyplot.plot(val_loss)
pyplot.legend(['loss', 'val_loss'])
pyplot.show()
return model
# make a forecast
def forecast(model, history, n_input):
# flatten data
data = array(history)
data = data.reshape((data.shape[0]*data.shape[1], data.shape[2]))
# retrieve last observations for input data
input_x = data[-n_input:, :]
# reshape into [1, n_input, n]
input_x = input_x.reshape((1, input_x.shape[0], input_x.shape[1]))
# forecast the next year
yhat = model.predict(input_x, verbose=0)
# we only want the vector forecast
yhat = yhat[0]
return yhat
# evaluate a single model
def evaluate_model(train, test, n_input):
# fit model
model = build_model(train, n_input)
# history is a list of yearly data
history = [x for x in train]
# walk-forward validation over each year
predictions = list()
for i in range(len(test)):
# predict the year
yhat_sequence = forecast(model, history, n_input)
# store the predictions
predictions.append(yhat_sequence)
# get real observation and add to history for predicting the next year
history.append(test[i,:])
# evaluate predictions days for each year
predictions = array(predictions)
score, scores, inv_scores = evaluate_forecasts(test[:, :, 0], predictions)
return score, scores,inv_scores
# split into train and test
train, test = split_data_yearly(train, test)
# evaluate model and get scores
n_input = 12
score, scores, inv_scores = evaluate_model(train, test, n_input)
# summarize scores
summarize_scores('lstm', score, scores)
print('RMSE score after inversion:',inv_scores)
# plot scores
months=['July','August','September','October','November','December','January','February','March','April','May','June']
#pyplot.plot(months, scores, marker='o', label='lstm')
#pyplot.show()
Differencing is the key here!
After further investigation, I found out that my model produces values that is almost zero before differencing (not learning).....When I invert the differencing, I am adding zero to the actual value in the previous timestep, which results in the shifted pattern above.
Therefore, I need to tune my LSTM model to make it learn or maybe remove the zeros part in the data itself since I have many of those.
As you can see below i have two functions , get_data() outputs a data frame for the selected asset history and passes it to train_model() every thing works fine but as the model trains the accuracy does not seem to change the loss does go down but the accuracy stays the same after the second epoch ,when training with 1000 epochs the accuracy also does not change
Things i tried changing with this code:
changing unit count for each of the LSTM layers
tried using differnet data frames from different sources ( alpha-vantage )
changing epoch count
unfortunately nothing changed
def train_model( df):
if not os.path.exists("/py_stuff/"):
os.makedirs("/py_stuff/")
checkpoint_filepath ="/py_stuff/check_point"
weights_checkpoint = "/py_stuff/"
checkpoint_dir = os.path.dirname(checkpoint_filepath)
model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_filepath,
save_weights_only=True,
monitor='accuracy',
mode='max',
save_best_only=True,
verbose=1)
dataset_train = df
training_set = dataset_train.iloc[:, 1:2].values
sc = MinMaxScaler(feature_range=(0,1))
training_set_scaled = sc.fit_transform(training_set)
X_train = []
y_train = []
for i in range(100, len(df)):
X_train.append(training_set_scaled[i-100:i, 0])
y_train.append(training_set_scaled[i, 0])
X_train, y_train = np.array(X_train), np.array(y_train)
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
model = Sequential()
model.add(LSTM(units = 100, return_sequences = True, input_shape = (X_train.shape[1], 1)))
model.add(Dropout(0.2))
model.add(LSTM(units=100 , return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=100 , return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=100))
model.add(Dropout(0.2))
model.add(Dense(units=1))
model.compile(optimizer='adam', loss='mean_squared_error' , metrics=['accuracy'])
## loading weights
try:
model.load_weights(checkpoint_filepath)
print ("Weights loaded successfully $$$$$$$ ")
except:
print ("No Weights Found !!! ")
model.fit(X_train,y_train,epochs=50,batch_size=100, callbacks=[model_checkpoint_callback])
## saving weights
try:
model.save(checkpoint_filepath)
model.save_weights(filepath=checkpoint_filepath)
print ("Saving weights and model done ")
except OSError as no_model:
print ("Error saving weights and model !!!!!!!!!!!! ")
def get_data(CHOICE):
data = yf.download( # or pdr.get_data_yahoo(...
# tickers list or string as well
tickers = CHOICE,
# use "period" instead of start/end
# valid periods: 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max
# (optional, default is '1mo')
period = "5y",
# fetch data by interval (including intraday if period < 60 days)
# valid intervals: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo
# (optional, default is '1d')
interval = "1d",
# group by ticker (to access via data['SPY'])
# (optional, default is 'column')
group_by = 'ticker',
# adjust all OHLC automatically
# (optional, default is False)
auto_adjust = True,
# download pre/post regular market hours data
# (optional, default is False)
prepost = True,
# use threads for mass downloading? (True/False/Integer)
# (optional, default is True)
threads = True,
# proxy URL scheme use use when downloading?
# (optional, default is None)
proxy = None
)
dff = pd.DataFrame(data)
return dff
df = get_data(CHOICE="BTC-USD")
train_model(df)
From your loss function, it looks like you have a regression network. Your loss is Mean Squared Error and the metric accuracy does not have any meaning for regression networks. Accuracy metric is only meaningful when used for classification models. So you can remove the metrics=['accuracy'] from your compile code and and use loss value to evaluate your model. So if the loss is decreasing that means your optimizer is successfully training the network.
You are dealing with a regression problem where the accuracy is not defined.
The accuracy is defined as the probability of belonging to a specific class. For example, the probability of the output to be the digit 9. The number of classes is finite (or countable).
In your case, your network outputs a real number. The notion of accuracy does not make any sense in this context.
For example, the probability of your output to be 1.000 for example is 0. Although (and surprisingly!), a probability of zero does not mean that the event will never happen!
Ideally, Keras should return an error saying accuracy not defined.
I am trying to forecast a variable called yield spread - "yieldsp" using several macroeconomic variables. "yieldsp" is a column in a dataframe called "stat2" with date datetime index. Initially, I had forecasted "yieldsp" using the ARIMA model wherein I employed the following code:
# fit the model on the train set and generate prediction for each element on the test set.
# perform a rolling forecast : re-create the ARIMA forecast when each new observation is received.
# forecast(): performs a one-step forecast from the model
# history - list created to track all the observations seeded with the training set
# => after each iteration, all new observations are appended to the list "history",
yieldsp = stat2["yieldsp"]
X = yieldsp.values
size = int(len(X) * 0.95)
train, test = X[0:size], X[size:len(X)]
history = [x for x in train]
predictions = list()
# walk-forward validation
for t in range(len(test)):
model = ARIMA(history, order=(1,0,4))
model_fit = model.fit(disp=0)
output = model_fit.forecast()
yhat = output[0]
predictions.append(yhat)
obs = test[t]
history.append(obs)
print('predicted=%f, expected=%f' % (yhat, obs))
It worked and generated predicted and expected values. A part of the results are shown below:
predicted=0.996081, expected=0.960000
predicted=0.959644, expected=0.940000
predicted=0.937272, expected=0.930000
predicted=0.932651, expected=0.970000
predicted=0.976372, expected=0.960000
predicted=0.961283, expected=0.940000
But now, I want to use multiplied variables to forecast yieldsp. These variables in "stat2" are:
yieldsp = stat2[['ffr', 'house_st_change','rwage', 'epop_diff2','ipi_change_diff2', 'sahm_diff2']]
ffr house_st_change rwage epop_diff2 ipi_change_diff2 sahm_diff2 yieldsp
Date
1982-03-31 14.68 -28.713629 0.081837 -4.000000e-01 -3.614082 0.227545 0.19
1982-04-30 14.94 -32.573529 0.081789 2.000000e-01 0.838893 -0.061298 0.72
1982-05-31 14.45 -10.087719 0.081752 2.000000e-01 -0.765399 -0.062888 1.74
1982-06-30 14.15 -13.684211 0.080928 -2.000000e-01 0.421589 -0.039439 1.08
1982-07-31 12.59 12.007685 0.081026 -1.421085e-14 -0.141606 -0.032772 3.11
So, I attempted the following:
yieldsp = stat2[['ffr', 'house_st_change','rwage', 'epop_diff2','ipi_change_diff2', 'sahm_diff2', 'yieldsp']]
X = yielsp.values
size = int(len(X) * 0.8)
train, test = X[0:size], X[size:len(X)]
history = [x for x in train]
predictions = list()
# walk-forward validation
for t in range(len(test)):
model = ARIMA(history, order=(1,0,1))
model_fit = model.fit(disp=0)
output = model_fit.forecast()
yhat = output[0]
predictions.append(yhat)
obs = test[t]
history.append(obs)
print('predicted=%f, expected=%f' % (yhat, obs))
But error occurred:
ValueError: could not broadcast input array from shape (7) into shape (1)
I am unsure how to fix that. I think to forecast "yieldsp" we would need the forecasted values of the exogenous variables too. And I also think we need to modify the codes which state:
history = [x for x in train]
predictions = list()
# walk-forward validation
for t in range(len(test)):
model = ARIMA(history, order=(1,0,4))
I would appreciate any kind of help.
(Reference: https://machinelearningmastery.com/make-sample-forecasts-arima-python/)
Hello i m trying to complete an assignment based on training a perceptron (without any hidden layer) to perform binary classification using sigmoid activation function. but due to some reason my code is not working correctly. although the error is decreasing after each epoch but accuracy is not increasing. i have target labels 1 and 0, but my predicted labels are almost all close to one. none of my predicted label is representing the 0 class.
below is my code. anyone please tell me what have i done wrong.
<# Create a Neural_Network class
class Neural_Network(object):
def __init__(self,inputSize = 2,outputSize = 1 ):
# size of layers
self.inputSize = inputSize
self.outputSize = outputSize
#weights
self.W1 = 0.01*np.random.randn(inputSize+1, outputSize) # randomly initialize W1 using random function of numpy
# size of the wieght will be (inputSize +1, outputSize) that +1 is for bias
def feedforward(self, X): #forward propagation through our network
n,m=X.shape
Xbias = np.ones((n,1)) #bias term in input
Xnew = np.hstack((Xbias,X)) #adding biasterm in input to match the dimension with the weigth
self.product=np.dot(Xnew,self.W1) # dot product of X (input) and set of weights
output=self.sigmoid(self.product) # apply activation function (i.e. sigmoid)
return output # return your answer with as a final output of the network
def sigmoid(self, s):# apply sigmoid function on s and return its value
return (1./(1. + np.exp(-s))) #activation sigmoid function
def sigmoid_derivative(self, s):#derivative of sigmoid
#derivative of sigmoid = sigmoid(x)*(1-sigmoid(x))
return s*(1-s) # here s will be sigmoid(x)
def backwardpropagate(self,X, Y, y_pred, lr):
# backward propagate through the network
# compute error in output which is loss, compute cross entropy loss function
self.output_error=self.crossentropy(Y,y_pred) #output error
# applying derivative of sigmoid to the error
self.error_deriv=self.output_error*self.sigmoid_derivative(y_pred)
# adjust set of weights
n,m=X.shape
Xbias = np.ones((n,1)) #bias term in input
Xnew = np.hstack((Xbias,X)) #adding biasterm in input to match the dimension with the weigth
self.W1 += lr*(Xnew.T.dot(self.error_deriv)) # W1=W1+ learningrate*errorderiv*input
#self.W1 += X.T.dot(self.z2_delta)
def crossentropy(self, Y, Y_pred):
# compute error based on crossentropy loss
#Cross entropy= sum(Y_actual*log(y_predicted))/N. here 1e-6 is used to avoid log 0
N = Y_pred.shape[0]
#cr_entropy=-np.sum(((Y*np.log(Y_pred+1e-6))+((1-Y)*np.log(1-Y_pred+1e-6))))/N
cr_entropy=-np.sum(Y*np.log(Y_pred+1e-6))/N
return cr_entropy #error
Null=None
def train(self, trainX, trainY,epochs = 100, learningRate = 0.001, plot_err = True ,validationX = Null, validationY = Null):
tr_error=[]
for i in range(epochs):
# feed forward trainX and trainY and recievce predicted value
y_predicted=self.feedforward(trainX)
print(i,y_predicted)
# backpropagation with trainX, trainY, predicted value and learning rate.
self.backwardpropagate(trainX,trainY,y_predicted,learningRate)
tr_error.append(self.output_error)
print(i,self.output_error)
print(i,self.W1)
# """"""if validationX and validationY are not null than show validation accuracy and error of the model.""""""
# plot error of the model if plot_err is true
epocharray=range(0,epochs)
plt.plot(epocharray,tr_error,'r',linewidth=3.0) #plotting error vs. no. of epochs
plt.xlabel('No. of Epochs')
plt.ylabel('Cross Entropy Error')
plt.title('Error Vs. Epoch')
def predict(self, testX):
# predict the value of testX
self.ytest_pred=self.feedforward(testX)
def accuracy(self, testX, testY):
import math
# predict the value of trainX
self.ytest_pred1=self.feedforward(testX)
acc=0
# compare it with testY
for j in range(len(testY)):
q=math.ceil(self.ytest_pred1[j])
#p=round(q)
if testY[j] == q:
acc +=1
accuracy=acc/float(len(testX))*100
print("Percentage Accuracy is", accuracy,"%")
# compute accuracy, print it and """"""show in the form of picture""""""
return accuracy # return accuracy>
# generating dataset point
np.random.seed(1)
no_of_samples = 2000
dims = 2
#Generating random points of values between 0 to 1
class1=np.random.rand(no_of_samples,dims)
#To add separability we will add a bias of 1.1
class2=np.random.rand(no_of_samples,dims)+1.1
class_1_label=np.array([1 for n in range(no_of_samples)])
class_2_label=np.array([0 for n in range(no_of_samples)])
#Lets visualize the dataset
plt.scatter(class1[:,0],class1[:,1], marker='^', label="class 1")
plt.scatter(class2[:,0],class2[:,1], marker='o', label="class 2")
plt.xlabel('Dimension 1')
plt.ylabel('Dimension 2')
plt.legend(loc='best')
plt.show()
from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split
# Data concatenation
data = np.concatenate((class1,class2),axis=0)
label = np.concatenate((class_1_label,class_2_label),axis=0)
#Note: shuffle this dataset before dividing it into three parts
data,label=shuffle(data,label)
#print(data)
# now using train_test_split command to split data into 60% training data, 20% testing data and 20% validation data
trainX, testX, trainY, testY = train_test_split(data, label, test_size=0.2, random_state=1)
trainX, validX, trainY, validY = train_test_split(trainX, trainY, test_size=0.25, random_state=1)
model = Neural_Network(2,1)
# try different combinations of epochs and learning rate
model.train(trainX, trainY, epochs = 100, learningRate = 0.000001, validationX = validX, validationY = validY)
model.accuracy( testX,testY)
the Results are coming like this(no label going near 0)
0 [[0.49670809]
[0.4958389 ]
[0.4966064 ]
...
[0.49537492]
[0.49566927]
[0.4961255 ]]
0 828.1069658303942
0 [[0.48311074]
[0.51907406]
[0.52764299]]
1 [[0.69813116]
[0.91746189]
[0.80408611]
...
[0.74821077]
[0.87150079]
[0.75187736]]
1 250.96538025031356
1 [[0.56983781]
[0.59205773]
[0.60057486]]
2 [[0.72602796]
[0.94067579]
[0.83591236]
...
[0.77916283]
[0.90032058]
[0.78291184]]
2 210.645081151866
2 [[0.63353102]
[0.64265939]
[0.65118627]]
3 [[0.74507968]
[0.95318096]
[0.85588864]
...
[0.79953834]
[0.91705918]
[0.80329027]]
3 186.2933734713245
3 [[0.6846678 ]
[0.68164316]
[0.69020355]]
4 [[0.75952936]
[0.96114086]
[0.87010085]
...
[0.81456476]
[0.92830628]
[0.81829009]]
4 169.32091332021724
4 [[0.72771826]
[0.71342293]
[0.72202744]]
5 [[0.77112943]
[0.96669774]
[0.88093323]
...
[0.82635507]
[0.93649788]
[0.83004119]]
5 156.53923256347372
Please help me to solve this problem
I see you have set learning rate too small. Set it to 0.001 and Increase epoch to 20k and you will see your model learning well.
Plotting error vs epoch's should give you better idea where to stop.
I am working on a path prediction problem where I am predicting the path (Latitude, Longitude) one time step ahead. I have the path data for nearly 1500 "events", which I use for training the LSTM model. For training, since I know the path a priori, I shift the time series by one step, and use it as a target vector. For example:
Event 1:
Lat (t), Lon (t) --> Lat (t+1), Lon (t+1)
Lat (t+1), Lon (t+1) --> Lat (t+2), Lon (t+2)
However, for testing since the path is not known a priori, I take the trained LSTM model, and predict one time step at a time, and feed the predicted value as an input for the next time step. Below are the snippets from my code-
# Extract only the Lat, Lon values to arrays
train_full = train_df[['LatNor','LonNor','LatLag1Nor','LonLag1Nor']].values
test_full = test_df[['LatNor','LonNor','LatLag1Nor','LonLag1Nor']].values
print('train_full.shape = ', train_full.shape)
print('test_full.shape = ', test_full.shape)
# Separate the Inputs and Targets
x_train_full = train_full[:,0:2]
y_train_full = train_full[:,2:4]
x_test_full = test_full[:,0:2]
y_test_full = test_full[:,2:4]
# Defining the LSTM model
model = Sequential()
model.add(LSTM(40,input_shape=(None,2), return_sequences=True))
model.add(Dropout(0.1))
model.add(LSTM(20,input_shape=(None,2), return_sequences=True))
model.add(Dropout(0.1))
model.add(Dense(2))
model.add(Activation('linear'))
model.compile(loss='mean_squared_error', optimizer='rmsprop', metrics=['accuracy'])
model.summary()
epochs = 50
for i in range(epochs):
print ('Running Epoch No: ', i)
for stormID, data in train_df.groupby('EventID'):
train = data[['LatNor','LonNor','LatLag1Nor','LonLag1Nor']]
train = train.values
x_train = np.expand_dims(train[:,0:2], axis=0)
y_train = np.expand_dims(train[:,2:4], axis=0)
#print (x_train.shape, y_train.shape)
model.train_on_batch(x_train,y_train)
model.reset_states()
print('Model training done.....')
#Use the optimized weights to estimate target values for training data
train_pred = new_model.predict_on_batch(np.expand_dims(train_df[['LatNor','LonNor']].values, axis=0))
train_pred_val = x_scaler.inverse_transform(train_pred[0])
The model trains well (see plots below)
enter image description here
enter image description here
When I use the trained model, and do a predict_on_batch on the test data, it works great. But, in reality we would not know the time series ahead of time. So, when I predict one instance at a time for the test set, and use it as an input to for the next time step, it is not working well. I suspect I am missing something, and changing the state/weights of the trained network whenever I make a predict call to the network.
x_values = TestDF[['LatNor','LonNor']].values
x_values_scaled = x_values
start = x_values_scaled[0,:]
startX = start.reshape(1,1,2)
Results = np.empty(())
Results = x_scaler.inverse_transform(startX[0])
for i in range(x_values.shape[0]):
nextLoc = model.predict(startX)
nextLoc_rescaled = x_scaler.inverse_transform(nextLoc[0])
Results = np.vstack((Results,nextLoc_rescaled))
startX = nextLoc
Any thoughts or recommendations?