LSTM giving same prediction for numerical data

LSTM giving same prediction for numerical data - python

I created an LSTM model for intraday stock predictions. I took the training data with the shape of (290, 4). I did all the preprocessing like Normalize the data, taking the difference, taking window size of 4.
This is a sample of my input data.
X = array([[0, 0, 0, 0],
[array([ 0.19]), 0, 0, 0],
[array([-0.35]), array([ 0.19]), 0, 0],
...,
[array([ 0.11]), array([-0.02]), array([-0.13]), array([-0.09])],
[array([-0.02]), array([ 0.11]), array([-0.02]), array([-0.13])],
[array([ 0.07]), array([-0.02]), array([ 0.11]), array([-0.02])]], dtype=object)
y = array([[array([ 0.19])],
[array([-0.35])],
[array([-0.025])],
.....,
[array([-0.02])],
[array([ 0.07])],
[array([-0.04])]], dtype=object)
Note: I am giving as well as predicting the difference value. So input value is between range (-0.5,0.5)
Here is my Keras LSTM model :
dim_in = 4
dim_out = 1
model.add(LSTM(input_shape=(1, dim_in),
return_sequences=True,
units=6))
model.add(Dropout(0.2))
model.add(LSTM(batch_input_shape=(1, features.shape[1],features.shape[2]),return_sequences=False,units=6))
model.add(Dropout(0.3))
model.add(Dense(activation='linear', units=dim_out))
model.compile(loss = 'mse', optimizer = 'rmsprop')
for i in range(300):
#print("Completed :",i+1,"/",300, "Steps")
model.fit(X, y, epochs=1, batch_size=1, verbose=2, shuffle=False)
model.reset_states()
I am feeding the last sequence value of shape=(1,4) and predict the output.
This is my prediction :
base_value = df.iloc[290]['Close']
prediction = []
orig_pred = []
input_data = np.copy(test[0,:])
input_data = input_data.reshape(len(input_data),1)
for i in range(100):
inp = input_data[i:,:]
inp = inp.reshape(1,1,inp.shape[0])
y = model.predict(inp)
orig_pred.append(y[0][0])
input_data = np.insert(input_data,[i+4],y[0][0], axis=0)
base_value = base_value + y
prediction_apple.append(base_value[0][0])
sqrt(mean_squared_error(test_output, orig_pred))
RMSE = 0.10592485833344527
Here is the difference in prediction visualization along with stock price prediction.
fig:1 -> This is the LSTM prediction
fig:2 -> This is the Stock prediction
I am not sure why it is predicting the same output value after 10 iterations. Maybe it is the vanishing gradient problem or I am feeding fewer input data(290 approx) or problem in the model architecture. I am not sure.
Please Help how to get the reasonable result.
Thank you !!!

I don't work with Keras, but looking through your code and plots it seems like the complexity of your network might not be high enough to fit the data. Try enlarging the network with more units and also try larger window sizes.

Because your regressor secures the minimization of the cost function by replicating the feature you give as input feature. For example if you have BTC closing value as $6340 at time t, it will go for it at t+1 or some value close to it. Ensure that you are not giving a direct numerical intuition to a regressor that what the predicted label might be, especially when working with time-series data.

Related

Is output of Batch Normalization in Keras dependent on number of epochs?

I am finding output of batchnormalization in Keras.
My model is:
#Import libraries
import numpy as np
import keras
from keras import layers
from keras.layers import Input, Dense, Activation, BatchNormalization, Flatten, Conv2D
from keras.models import Model
#Model
def HappyModel3(input_shape):
X_input = Input(input_shape, name='input_layer')
X = BatchNormalization(axis = 1, name = 'batchnorm_layer')(X_input)
X = Dense(1, activation='sigmoid', name='sigmoid_layer')(X)
model = Model(inputs = X_input, outputs = X, name='HappyModel3')
return model
Compiling Model | here number of epochs is 1
X_train=np.array([[1,1,-1],[2,1,1]])
Y_train=np.array([0,1])
happyModel_1=HappyModel3(X_train[0].shape)
happyModel_1.compile(optimizer=keras.optimizers.RMSprop(), loss=keras.losses.mean_squared_error)
happyModel_1.fit(x = X_train, y = Y_train, epochs = 1 , batch_size = 2, verbose=0 )
finding Batch Normalisation layer's output for model with epochs=1:
for i in range(0, len(happyModel_1.layers)):
tmp_model = Model(happyModel_1.layers[0].input, happyModel_1.layers[i].output)
tmp_output = tmp_model.predict(X_train)
if i in (0,1) :
print(happyModel_1.layers[i].name)
print(tmp_output.shape)
print(tmp_output)
print('\n')
Code Output is:
input_layer
(2, 3)
[[ 1. 1. -1.]
[ 2. 1. 1.]]
batchnorm_layer
(2, 3)
[[ 0.99003249 0.99388224 -0.99551398]
[ 1.99647105 0.99388224 0.9971655 ]]
We've normalized at axis=1 |
Batch Norm Layer Output: At axis=1, 1st dimension mean is 1.5, 2nd dimension mean is 1, 3rd dimension mean is 0.
Since its batch norm, I expect mean to be close to 0 for all 3 dimensions
This happens when I increase epochs to 1000:
happyModel_2=HappyModel3(X_train[0].shape)
happyModel_2.compile(optimizer=keras.optimizers.RMSprop(), loss=keras.losses.mean_squared_error)
happyModel_2.fit(x = X_train, y = Y_train, epochs = 1000 , batch_size = 2, verbose=0 )
finding Batch Normalisation layer's output for model with epochs=1000:
for i in range(0, len(happyModel_2.layers)):
tmp_model = Model(happyModel_2.layers[0].input, happyModel_2.layers[i].output)
tmp_output = tmp_model.predict(X_train)
if i in (0,1) :
print(happyModel_2.layers[i].name)
print(tmp_output.shape)
print(tmp_output)
print('\n')
#Code output
input_layer
(2, 3)
[[ 1. 1. -1.]
[ 2. 1. 1.]]
batchnorm_layer
(2, 3)
[[ -1.95576239e+00 8.08715820e-04 -1.86621261e+00]
[ 1.95795488e+00 8.08715820e-04 1.86590290e+00]]
We've normalized at axis=1 | Now At axis=1, batch norm layer output is: 1st dimension mean is 0, 2nd dimension mean is 0, 3rd dimension mean is 0. THIS IS AN EXPECTED OUTPUT NOW
My question is: Is output of Batch Normalization in Keras dependent on number of epochs?
(Probably YES, as we do backpropagation, batch Normalization parameters will be affected by increasing number of epochs)

The keras documentation for BatchNormalization gives an answer to your question:
Importantly, batch normalization works differently during training and
during inference.
What happens during training, i.e. when calling model.fit()?
During training [...], the layer normalizes its output
using the mean and standard deviation of the current batch of inputs.
But what will happen during inference, i.e. when calling mode.predict() as in your examples?
During inference [...], the layer normalizes its output using a moving average of
the mean and standard deviation of the batches it has seen during
training. That is to say, it returns (batch - self.moving_mean) / (self.moving_var + epsilon) * gamma + beta.
self.moving_mean and self.moving_var are non-trainable variables that
are updated each time the layer in called in training mode [...].
It's important to understand that batch normalization will calculate the statistics (mean and variance) of your whole training data during training by looking at statistics of single batches and internally updating the moving_mean and moving_variance parameters by a running average computed form the single batch statistics. Therefore they're not affected by backpropagation. Ideally, after your model has seen enough training examples (or did enough training epochs), moving_mean and moving_variance will correspond to the statistics of your whole training set. These two parameters are then used during inference to normalize test examples. At the start of training the two parameters will be initialized to 0 and 1. Further batch norm has two more parameters called gamma and beta, which will be updated by the optimizer and therefore depend on your loss.
In essence, yes, the output of batch normalization during inference is dependent on the number of epochs you have trained your model. Firstly, due to changing moving averages for mean and variance and second due to learned parameters gamma and beta.
For a deeper understanding of how batch normalization works and why it is needed, have a look at the original publication.

calculating accuracy for Monte carlo Dropout on pytorch

I have found an implementation of the Monte carlo Dropout on pytorch the main idea of implementing this method is to set the dropout layers of the model to train mode. This allows for different dropout masks to be used during the different various forward passes.
The implementation illustrate how multiple predictions from the various forward passes are stacked together and used for computing different uncertainty metrics.
import sys
import numpy as np
import torch
import torch.nn as nn
def enable_dropout(model):
""" Function to enable the dropout layers during test-time """
for m in model.modules():
if m.__class__.__name__.startswith('Dropout'):
m.train()
def get_monte_carlo_predictions(data_loader,
forward_passes,
model,
n_classes,
n_samples):
""" Function to get the monte-carlo samples and uncertainty estimates
through multiple forward passes
Parameters
----------
data_loader : object
data loader object from the data loader module
forward_passes : int
number of monte-carlo samples/forward passes
model : object
keras model
n_classes : int
number of classes in the dataset
n_samples : int
number of samples in the test set
"""
dropout_predictions = np.empty((0, n_samples, n_classes))
softmax = nn.Softmax(dim=1)
for i in range(forward_passes):
predictions = np.empty((0, n_classes))
model.eval()
enable_dropout(model)
for i, (image, label) in enumerate(data_loader):
image = image.to(torch.device('cuda'))
with torch.no_grad():
output = model(image)
output = softmax(output) # shape (n_samples, n_classes)
predictions = np.vstack((predictions, output.cpu().numpy()))
dropout_predictions = np.vstack((dropout_predictions,
predictions[np.newaxis, :, :]))
# dropout predictions - shape (forward_passes, n_samples, n_classes)
# Calculating mean across multiple MCD forward passes
mean = np.mean(dropout_predictions, axis=0) # shape (n_samples, n_classes)
# Calculating variance across multiple MCD forward passes
variance = np.var(dropout_predictions, axis=0) # shape (n_samples, n_classes)
epsilon = sys.float_info.min
# Calculating entropy across multiple MCD forward passes
entropy = -np.sum(mean*np.log(mean + epsilon), axis=-1) # shape (n_samples,)
# Calculating mutual information across multiple MCD forward passes
mutual_info = entropy - np.mean(np.sum(-dropout_predictions*np.log(dropout_predictions + epsilon),
axis=-1), axis=0) # shape (n_samples,)
What I'm trying to do is to calculate accuracy across different forward passes, can anyone please help me on how to get the accuracy and make any changes on the dimensions used in this implementation
I am using the CIFAR10 dataset and would like to use the dropout on test time The code for the data_loader
testset = torchvision.datasets.CIFAR10(root='./data', train=False,download=True, transform=test_transform)
#loading the test set
data_loader = torch.utils.data.DataLoader(testset, batch_size=n_samples, shuffle=False, num_workers=4) ```

Accuracy is the percentage of correctly classified samples. You can create a boolean array that indicates whether a certain prediction is equal to its corresponding reference value, and you can get the mean of these values to calculate accuracy. I have provided a code example of this below.
import numpy as np
# 2 forward passes, 4 samples, 3 classes
# shape is (2, 4, 3)
dropout_predictions = np.asarray([
[[0.2, 0.1, 0.7], [0.1, 0.5, 0.4], [0.9, 0.05, 0.05], [0.25, 0.74, 0.01]],
[[0.1, 0.5, 0.4], [0.2, 0.6, 0.2], [0.8, 0.10, 0.10], [0.25, 0.01, 0.74]]
])
# Get the predicted value for each sample in each forward pass.
# Shape of output is (2, 4).
classes = dropout_predictions.argmax(-1)
# array([[2, 1, 0, 1],
# [1, 1, 0, 2]])
# Test equality among the reference values and predicted classes.
# Shape is unchanged.
y_true = np.asarray([2, 1, 0, 1])
elementwise_equal = np.equal(y_true, classes)
# array([[ True, True, True, True],
# [False, True, True, False]])
# Calculate the accuracy for each forward pass.
# Shape is (2,).
elementwise_equal.mean(axis=1)
# array([1. , 0.5])
In the example above, you can see that the accuracy for the first forward pass was 100%, and the accuracy for the second forward pass was 50%.

#jakub's answer is correct. However, I wanted to propose an alternate approach that may be better especially for more nascent researchers.
Scikit-learn comes with many built in performance measurement functions, including accuracy. To get those approaches to work with PyTorch, you only need to convert your torch tensor to numpy arrays:
x = torch.Tensor(...) # Fill-in as needed
x_np = x.numpy() # Convert to numpy
Then, you simply import scikit-learn:
from sklearn.metrics import accuracy_score
y_pred = [0, 2, 1, 3]
y_true = [0, 1, 2, 3]
accuracy_score(y_true, y_pred)
This simply returns 0.5. Easy peasy and less likely to have a bug.

Tensorflow Argmax equivalent for multilabel classification

I want to do evaluation of a classification Tensorflow model.
To compute the accuracy, I have the following code :
predictions = tf.argmax(logits, axis=-1, output_type=tf.int32)
accuracy = tf.metrics.accuracy(labels=label_ids, predictions=logits)
It work well in single label classification, but now I want to do multilabel classification, where my labels are Array of Integers instead of Integers.
Here is an example of label [0, 1, 1, 0, 1, 0] that are stored in label_ids, and an example of predictions [0.1, 0.8, 0.9, 0.1, 0.6, 0.2] from the Tensor logits
What function should I use instead of argmax to do so ? (My labels are arrays of 6 Integers with value of either 0 or 1)
If needed, we can suppose that there is a threshold of 0.5.

It is probably better to do this type of post-processing evaluation outside of tensorflow, where it is more natural to try several different thresholds.
If you want to do it in tensorflow, you can consider:
predictions = tf.math.greater(logits, tf.constant(0.5))
This will return a tensor of the original logits shape with True for all entries greater than 0.5. You can then calculate accuracy as before. This is suitable for cases where many labels can be simultaneously true for a given sample.

Use below code to caclutae accuracy in multiclass classification:
tf.argmax will return the axis where y value is max for both y_pred and y_true(actual y).
Further tf.equal is used to find total number of matches (It returns True, False).
Convert the boolean into float(i.e. 0 or 1) and use tf.reduce_mean to calculate the accuracy.
correct_mask = tf.equal(tf.argmax(y_pred,1), tf.argmax(y_true,1))
accuracy = tf.reduce_mean(tf.cast(correct_mask, tf.float32))
Edit
Example with data:
import numpy as np
y_pred = np.array([[0.1,0.5,0.4], [0.2,0.6,0.2], [0.9,0.05,0.05]])
y_true = np.array([[0,1,0],[0,0,1],[1,0,0]])
correct_mask = tf.equal(tf.argmax(y_pred,1), tf.argmax(y_true,1))
accuracy = tf.reduce_mean(tf.cast(correct_mask, tf.float32))
with tf.Session() as sess:
# print(sess.run([correct_mask]))
print(sess.run([accuracy]))
Output:
[0.6666667]

Keras Recurrent Neural Networks For Multivariate Time Series

I have been reading about Keras RNN models (LSTMs and GRUs), and authors seem to largely focus on language data or univariate time series that use training instances composed of previous time steps. The data I have is a bit different.
I have 20 variables measured every year for 10 years for 100,000 persons as input data, and the 20 variables measured for year 11 as output data. What I would like to do is predict the value of one of the variables (not the other 19) for the 11th year.
I have my data structured as X.shape = [persons, years, variables] = [100000, 10, 20] and Y.shape = [persons, variable] = [100000, 1]. Below is my Python code for a LSTM model.
## LSTM model.
# Define model.
network_lstm = models.Sequential()
network_lstm.add(layers.LSTM(128, activation = 'tanh',
input_shape = (X.shape[1], X.shape[2])))
network_lstm.add(layers.Dense(1, activation = None))
# Compile model.
network_lstm.compile(optimizer = 'adam', loss = 'mean_squared_error')
# Fit model.
history_lstm = network_lstm.fit(X, Y, epochs = 25, batch_size = 128)
I have four (related) questions, please:
Have I coded the Keras model correctly for the data structure I have? The performance I get from a fully-connected network (using flattened data) and from LSTM, GRU, and 1D CNN models are nearly identical, and I don't know if I have made an error in Keras or if a recurrent model is simply not helpful in this case.
Should I have Y as a series with shape Y.shape = [persons, years] = [100000, 11], rather than including the variable in X, which would then have shape X.shape = [persons, years, variables] = [100000, 10, 19]? If so, how can I get the RNN to output the predicted sequence? When I use return_sequences = True, Keras returns an error.
Is this the best way to predict with the data I have? Are there better option choices available in the Keras RNN models, or even other models?
How could I simulate data resembling the data structure I have so that a RNN model would outperform a fully-connected network?
UPDATE:
I have tried a simulation, with what I hope is a very simple case where an RNN should be expected to outperform a FNN.
While the LSTM tends to outperform the FNN when both have less hidden layers (4), the performance becomes identical with more hidden layers (8+). Can anyone think of a better simulation where a RNN would be expected to outperform a FNN with a similar data structure?
from keras import models
from keras import layers
from keras.layers import Dense, LSTM
import numpy as np
import matplotlib.pyplot as plt
The code below simulates data for 10,000 instances, 10 time steps, and 2 variables. If the second variable has a 0 in the very first time step, then Y is the value of the first variable for the very last time step multiplied by 3. If the second variable has a 1 in the very first time step, then Y is the value of the first variable for the very last time step multiplied by 9.
My hope was that the RNN would keep the value of second variable at the very first time step in memory and use that to know which value (3 or 9) to multiply the the first variable for the very last time step.
## Simulate data.
instances = 10000
sequences = 10
X = np.zeros((instances, sequences * 2))
X[:int(instances / 2), 1] = 1
for i in range(instances):
for j in range(0, sequences * 2, 2):
X[i, j] = np.random.random()
Y = np.zeros((instances, 1))
for i in range(len(Y)):
if X[i, 1] == 0:
Y[i] = X[i, -2] * 3
if X[i, 1] == 1:
Y[i] = X[i, -2] * 9
Below is code for a FNN:
## Densely connected model.
# Define model.
network_dense = models.Sequential()
network_dense.add(layers.Dense(4, activation = 'relu',
input_shape = (X.shape[1],)))
network_dense.add(Dense(1, activation = None))
# Compile model.
network_dense.compile(optimizer = 'rmsprop', loss = 'mean_absolute_error')
# Fit model.
history_dense = network_dense.fit(X, Y, epochs = 100, batch_size = 256, verbose = False)
plt.scatter(Y[X[:, 1] == 0, :], network_dense.predict(X[X[:, 1] == 0, :]), alpha = 0.1)
plt.plot([0, 3], [0, 3], color = 'black', linewidth = 2)
plt.title('FNN, Second Variable has a 0 in the Very First Time Step')
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()
plt.scatter(Y[X[:, 1] == 1, :], network_dense.predict(X[X[:, 1] == 1, :]), alpha = 0.1)
plt.plot([0, 9], [0, 9], color = 'black', linewidth = 2)
plt.title('FNN, Second Variable has a 1 in the Very First Time Step')
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()
Below is code for a LSTM:
## Structure X data for LSTM.
X_lstm = X.reshape(X.shape[0], X.shape[1] // 2, 2)
X_lstm.shape
## LSTM model.
# Define model.
network_lstm = models.Sequential()
network_lstm.add(layers.LSTM(4, activation = 'relu',
input_shape = (X_lstm.shape[1], 2)))
network_lstm.add(layers.Dense(1, activation = None))
# Compile model.
network_lstm.compile(optimizer = 'rmsprop', loss = 'mean_squared_error')
# Fit model.
history_lstm = network_lstm.fit(X_lstm, Y, epochs = 100, batch_size = 256, verbose = False)
plt.scatter(Y[X[:, 1] == 0, :], network_lstm.predict(X_lstm[X[:, 1] == 0, :]), alpha = 0.1)
plt.plot([0, 3], [0, 3], color = 'black', linewidth = 2)
plt.title('LSTM, FNN, Second Variable has a 0 in the Very First Time Step')
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()
plt.scatter(Y[X[:, 1] == 1, :], network_lstm.predict(X_lstm[X[:, 1] == 1, :]), alpha = 0.1)
plt.plot([0, 9], [0, 9], color = 'black', linewidth = 2)
plt.title('LSTM, FNN, Second Variable has a 1 in the Very First Time Step')
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()

Yes the code used is correct for what you are trying to do. 10 years is the time window used to predict the following year so that should be the number of inputs into your model for each of the 20 variables. The sample size of 100,000 observations is not relevant to the input shape of your model.
The way that you had originally shaped the dependent variable Y is correct. You are predicting a window of 1 year for 1 variable and you have 100,000 observations. The key word argument return_sequences=True will cause an error to be thrown because you only have a single LSTM layer. Set this parameter to True if you are implementing multiple LSTM layers and the layer in question is followed by another LSTM layer.
I wish I could offer some guidance to 3 but without actually having your dataset I don't know if it's possible to answer this with any sort of certainty.
I will say that LSTM's were designed to address what is know as the the long term dependency problem present in regular RNN's. What this problem boils down to is that as the gap between when the relevant information was observed to the point where that information would be useful grows, the standard RNN will have a harder time learning the relationship between them. Think of predicting a stock price based on 3 days of activity vs an entire year.
This leads into number 4. If I use the term 'resembling' loosely and stretch your time window further out to say 50 years as opposed to 10, the advantages gained from using an LSTM would become more apparent. Although I'm sure that someone more experienced will be able to offer a better answer and I look forward to seeing it.
I found this page helpful for understanding LSTM's:
https://colah.github.io/posts/2015-08-Understanding-LSTMs/

Check the accuracy of vector

I learning AI with Python and have this situation: I created a deep learning model that has 10 neurons in his Input layer. On the output layer I have 3 neurons. I split up my data to 80% for learning and 20% for testing.
The trained model is ready for testing.
Until now, I always got situation that I have only one neuron in the output layer. So, I tested the accuracy in that way:
classifier = Sequential()
# ...
classifier.add(Dense(units = 3, kernel_initializer = 'uniform', activation = 'sigmoid'))
# ...
y_pred = classifier.predict(np.array(X_test))
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
which working great when the output layer has only ONE value on each prediction.
In my case, I have 3 values in each prediction.
y_pred = array ([[3.142904686503911194e-11, 1.000000000000000000e+00, 1.729809626091548085e-16],
[7.398544450698540942e-12, 1.000000000000000000e+00, 1.776427415878292515e-22],
[4.224535246066807304e-07, 1.000000000000000000e+00 7.929732391553923065e-12]])
And I want to compare it to my expected values, which:
y_test = [[0, 1, 0], [0, 1, 0], [0, 1, 0]]
So, I have the option to make this work manually:
Put 1 in the highest value in the prediction value. Other values are getting 0.
Compare the two vectors row by row.
It looks like must have a better way to do it?

You want to measure how "close" the prediction vector is to the expected vector. A good formula that describes the "amount of difference" between two vectors is to check the magnitude (or square magnitude) of the delta vector (prediction - expected).
In this case, you can do something like this:
def square_magnitude(vector):
return sum(x*x for x in vector)
def inaccuracy(pred, test): #should only get equal-length items
return square_magnitude([pred[i] - test[i] for i in range(len(pred))]) / len(pred)
Since you have three samples:
total_inaccuracy = sum(inaccuracy(y_pred[i], y_test[i]) for i in range(len(y_pred))) / len(y_pred)
This should be 0 when it's perfectly accurate and higher (positive) when it's less accurate.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.