Tuning LSTM autoencoder performance

Tuning LSTM autoencoder performance - python

I am trying to build an autoencoder of a multidimensional time series. I have followed various templates around the internet and SO, but all of them focus on how to get it running, but haven't found one on how to get running and get some meaningful results.
I've followed the tutorials starting here: https://blog.keras.io/building-autoencoders-in-keras.html; a practical example here: https://machinelearningmastery.com/lstm-autoencoders/.
import numpy as np
from keras.models import Sequential
from keras.layers import LSTM, RepeatVector
import matplotlib.pyplot as plt
# this sequence comes out of a MinMaxScaler. A separate question is if this was a good idea?
sequence = np.array([[0.63306452, 0.00714286],
[0.42069892, 0. ],
[0.36155914, 0.15 ],
[0.53629032, 0.12142857],
[0.32526882, 0.24285714],
[0.26344086, 0.52142857],
[0. , 0.79285714],
[0.49731183, 0.71428571],
[0.60080645, 0.25714286],
[0.63037634, 0.11428571],
[0.70698925, 0.26428571],
[0.71774194, 0.21428571],
[0.6155914 , 0.10714286],
[0.56451613, 0.36428571],
[0.66397849, 0.2 ],
[0.76344086, 0.17857143],
[0.66801075, 0.07142857],
[0.66935484, 0.02857143],
[0.90725806, 0.32857143],
[1. , 0.28571429],
[1. , 0.4 ],
[0.81451613, 0.47857143],
[0.41532258, 0.52142857],
[0.55107527, 0.63571429],
[0.42741935, 0.40714286],
[0.56989247, 0.75 ],
[0.76075269, 0.55 ],
[0.69758065, 0.58571429],
[0.73521505, 0.89285714],
[0.77150538, 1. ]])
n_in = len(sequence)
dim_in = sequence.shape[1]
latent_dim = 10
sequence = sequence.reshape((1, n_in, dim_in))
model = Sequential()
model.add(LSTM(latent_dim, input_shape=(n_in, dim_in)))
model.add(RepeatVector(n_in))
model.add(LSTM(dim_in, return_sequences=True))
model.compile(optimizer='adam', loss='mse')
model.summary()
model.fit(sequence, sequence, epochs=1000, verbose=0)
yhat = model.predict(sequence, verbose=0)
plt.figure(1)
plt.subplot(221)
plt.plot(sequence[0, :, 0])
plt.subplot(223)
plt.plot(yhat[0, :, 0])
plt.subplot(222)
plt.plot(sequence[0, :, 1])
plt.subplot(224)
plt.plot(yhat[0, :, 1])
The result I'm getting is not satisfactory (actuals in the upper row; autoencoder output in the lower row):
The decoded series are missing important features (like the spike in the RHS or drop on the LHS. Given the 'compression ratio' of 30:10 I would expect those events to be somehow reflected. I've tried playing with epochs, batch sizes, various activations and losses.
Anything obvious I am missing?
I want to run it on a much larger sequence (5000 time points, each point of potentially high dimension). Any tips for this?
should I change my approach altogether? The Author of this blog post https://towardsdatascience.com/autoencoders-for-the-compression-of-stock-market-data-28e8c1a2da3e didn't make it work with LSTM as well...

Related

Keras model predicts different results using the same input

I built a Keras sequential model on the simple dataset. I am able to train the model, however every time I try to get a prediction on the same input I get different values. Anyone knows why? I read through different Stackoverflow here (Why the exactly identical keras model predict different results for the same input data in the same env, Keras saved model predicting different values on different session, different prediction after load a model in keras), but couldn't find the answer. I tried to set the Tensorflow seed and still getting different results.
Here is my code
from pandas import concat
from pandas import DataFrame
# create sequence
length = 10
sequence = [i/float(length) for i in range(length)]
# create X/y pairs
df = DataFrame(sequence)
df = concat([df, df.shift(1)], axis=1)
df.dropna(inplace=True)
print(df)
# convert to LSTM friendly format
values = df.values
X, y = values[:, 0], values[:, 1]
X = X.reshape(len(X), 1, 1)
print(X.shape, y.shape)
output is:
0 0
1 0.1 0.0
2 0.2 0.1
3 0.3 0.2
4 0.4 0.3
5 0.5 0.4
6 0.6 0.5
7 0.7 0.6
8 0.8 0.7
9 0.9 0.8
(9, 1, 1) (9,)
Then start building the model
#configure network
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
tf.random.set_seed(1337)
n_batch = len(X)
n_neurons = 10
#design network
model = Sequential()
model.add(LSTM(n_neurons, batch_input_shape=(n_batch, X.shape[1], X.shape[2]), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X,y,epochs=2,batch_size=n_batch,verbose=1,shuffle=False)
Now every time I run the following code to get the prediction I get different results as you can see here
model.predict(X)
********output**************
array([[0.03817442],
[0.07164046],
[0.10493257],
[0.13797525],
[0.17069395],
[0.20301574],
[0.23486984],
[0.26618803],
[0.29690543]], dtype=float32)
model.predict(X)
********output**************
array([[0.04415776],
[0.08242793],
[0.12048437],
[0.15823033],
[0.19556962],
[0.2324073 ],
[0.26865062],
[0.3042098 ],
[0.33899906]], dtype=float32)

The problem is setting stateful=True in your LSTM layer, as this keeps the state between predict calls, so each prediction depends on previous predictions.
So as a solution, set stateful=False.

I think this library and the documentation attached to it will be interesting for your work.
Based on the above library, in a recent work I had with the Keras, I was starting the code as follows:
import os
import numpy as np
from numpy.random import seed
seed(42)
rng = np.random.RandomState(42)
import tensorflow
tensorflow.random.set_seed(42)
os.environ['TF_DETERMINISTIC_OPS'] = '1'
There seemed to be a good deal of determinism in the results, and it was good enough for what I was working on at the time.

Based on #Dr.Snoopy the problem was setting stateful = True. Setting it to False fixed the issue. "Boolean (default False). If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch." and my misunderstanding was that this only applies to training.
Thanks to #Dr.Snoopy for pointing that out.

AutoEncoder feature layers is unstable

I am replicating a linear autoencoder method based on this example here; https://towardsdatascience.com/build-the-right-autoencoder-tune-and-optimize-using-pca-principles-part-ii-24b9cca69bd6
Basically, it is using a one layer linear autoencoder to compare with PCA. X is randomly generated from a normal distribution with dimension 5. The core part of the code looks like this.
encoder = Dense(encoding_dim, activation="linear", input_shape=(input_dim,), use_bias = True)
decoder = Dense(input_dim, activation="linear", use_bias = True)
autoencoder = Sequential()
autoencoder.add(encoder)
autoencoder.add(decoder)
autoencoder.compile(metrics=['accuracy'],
loss='mean_squared_error',
optimizer='sgd')
autoencoder.summary()
autoencoder.fit(X_train_scaled, X_train_scaled,
epochs=nb_epoch,
batch_size=batch_size,
shuffle=True,
verbose=0)
The feature layer is
w_encoder = autoencoder.layers[0].get_weights()[1]
w_decoder = autoencoder.layers[1].get_weights()[1]
Every time I fit the model, the output for w_encoder is significantly different
Encoder_weights
[[ 0.5596451 -0.7303996 ]
[-0.08105161 0.43715334]
[ 0.7571198 0.4995086 ]
[-0.68543106 0.0496945 ]
[-0.46657953 0.1231109 ]]
Decoder_weights
[[ 0.5596451 -0.7303996 ]
[-0.08105161 0.43715334]
[ 0.7571198 0.4995086 ]
[-0.68543106 0.0496945 ]
[-0.46657953 0.1231109 ]]
vs
Encoder_weights
[[ 0.49870995 -0.594432 ]
[-0.03552848 0.3591121 ]
[ 0.6754906 0.42547104]
[-0.5236658 0.02657888]
[-0.36780515 0.07721919]]
Decoder_weights
[[ 0.49870995 -0.594432 ]
[-0.03552848 0.3591121 ]
[ 0.6754906 0.42547104]
[-0.5236658 0.02657888]
[-0.36780515 0.07721919]]
Is there any way to make the layer metrics stable between runs?

This can happen for many reasons.
The weight of your network is initialized randomly each time, so it's possible to get different results each time you run.
Your data loader is random in nature and randomly pulling samples.
If you want reproducibility, try the following,
Use seeds.
SEED = 1997
os.environ['PYTHONHASHSEED']=str(SEED)
random.seed(SEED)
np.random.seed(SEED)
tf.random.set_seed(SEED)
In the case of running on an Nvidia GPU, you should also use tensorflow-determinism
pip install tensorflow-determinism
and you use it like this:
import tensorflow as tf
import os
os.environ['TF_DETERMINISTIC_OPS'] = '1'
For Tensorflow < 2.1, add above and this:
from tfdeterminism import patch
patch()

Is output of Batch Normalization in Keras dependent on number of epochs?

I am finding output of batchnormalization in Keras.
My model is:
#Import libraries
import numpy as np
import keras
from keras import layers
from keras.layers import Input, Dense, Activation, BatchNormalization, Flatten, Conv2D
from keras.models import Model
#Model
def HappyModel3(input_shape):
X_input = Input(input_shape, name='input_layer')
X = BatchNormalization(axis = 1, name = 'batchnorm_layer')(X_input)
X = Dense(1, activation='sigmoid', name='sigmoid_layer')(X)
model = Model(inputs = X_input, outputs = X, name='HappyModel3')
return model
Compiling Model | here number of epochs is 1
X_train=np.array([[1,1,-1],[2,1,1]])
Y_train=np.array([0,1])
happyModel_1=HappyModel3(X_train[0].shape)
happyModel_1.compile(optimizer=keras.optimizers.RMSprop(), loss=keras.losses.mean_squared_error)
happyModel_1.fit(x = X_train, y = Y_train, epochs = 1 , batch_size = 2, verbose=0 )
finding Batch Normalisation layer's output for model with epochs=1:
for i in range(0, len(happyModel_1.layers)):
tmp_model = Model(happyModel_1.layers[0].input, happyModel_1.layers[i].output)
tmp_output = tmp_model.predict(X_train)
if i in (0,1) :
print(happyModel_1.layers[i].name)
print(tmp_output.shape)
print(tmp_output)
print('\n')
Code Output is:
input_layer
(2, 3)
[[ 1. 1. -1.]
[ 2. 1. 1.]]
batchnorm_layer
(2, 3)
[[ 0.99003249 0.99388224 -0.99551398]
[ 1.99647105 0.99388224 0.9971655 ]]
We've normalized at axis=1 |
Batch Norm Layer Output: At axis=1, 1st dimension mean is 1.5, 2nd dimension mean is 1, 3rd dimension mean is 0.
Since its batch norm, I expect mean to be close to 0 for all 3 dimensions
This happens when I increase epochs to 1000:
happyModel_2=HappyModel3(X_train[0].shape)
happyModel_2.compile(optimizer=keras.optimizers.RMSprop(), loss=keras.losses.mean_squared_error)
happyModel_2.fit(x = X_train, y = Y_train, epochs = 1000 , batch_size = 2, verbose=0 )
finding Batch Normalisation layer's output for model with epochs=1000:
for i in range(0, len(happyModel_2.layers)):
tmp_model = Model(happyModel_2.layers[0].input, happyModel_2.layers[i].output)
tmp_output = tmp_model.predict(X_train)
if i in (0,1) :
print(happyModel_2.layers[i].name)
print(tmp_output.shape)
print(tmp_output)
print('\n')
#Code output
input_layer
(2, 3)
[[ 1. 1. -1.]
[ 2. 1. 1.]]
batchnorm_layer
(2, 3)
[[ -1.95576239e+00 8.08715820e-04 -1.86621261e+00]
[ 1.95795488e+00 8.08715820e-04 1.86590290e+00]]
We've normalized at axis=1 | Now At axis=1, batch norm layer output is: 1st dimension mean is 0, 2nd dimension mean is 0, 3rd dimension mean is 0. THIS IS AN EXPECTED OUTPUT NOW
My question is: Is output of Batch Normalization in Keras dependent on number of epochs?
(Probably YES, as we do backpropagation, batch Normalization parameters will be affected by increasing number of epochs)

The keras documentation for BatchNormalization gives an answer to your question:
Importantly, batch normalization works differently during training and
during inference.
What happens during training, i.e. when calling model.fit()?
During training [...], the layer normalizes its output
using the mean and standard deviation of the current batch of inputs.
But what will happen during inference, i.e. when calling mode.predict() as in your examples?
During inference [...], the layer normalizes its output using a moving average of
the mean and standard deviation of the batches it has seen during
training. That is to say, it returns (batch - self.moving_mean) / (self.moving_var + epsilon) * gamma + beta.
self.moving_mean and self.moving_var are non-trainable variables that
are updated each time the layer in called in training mode [...].
It's important to understand that batch normalization will calculate the statistics (mean and variance) of your whole training data during training by looking at statistics of single batches and internally updating the moving_mean and moving_variance parameters by a running average computed form the single batch statistics. Therefore they're not affected by backpropagation. Ideally, after your model has seen enough training examples (or did enough training epochs), moving_mean and moving_variance will correspond to the statistics of your whole training set. These two parameters are then used during inference to normalize test examples. At the start of training the two parameters will be initialized to 0 and 1. Further batch norm has two more parameters called gamma and beta, which will be updated by the optimizer and therefore depend on your loss.
In essence, yes, the output of batch normalization during inference is dependent on the number of epochs you have trained your model. Firstly, due to changing moving averages for mean and variance and second due to learned parameters gamma and beta.
For a deeper understanding of how batch normalization works and why it is needed, have a look at the original publication.

How to interpret and transform the values predicted by Keras classifier?

I'm training my Keras model to predict whether, with the provided data parameter, it will make a shot or not and it will represent in such a way that 0 means no and 1 means yes. However, when I try to predict it I got values that are float.
I've tried using the data that is exactly the same as train data to get 1 but it does not work.
I used the data below to tried the one-hot encoding.
https://github.com/eijaz1/Deep-Learning-in-Keras-Tutorial/blob/master/keras_tutorial.ipynb
import pandas as pd
from keras.utils import to_categorical
from keras.models import load_model
from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import EarlyStopping
#read in training data
train_df_2 = pd.read_csv('diabetes_data.csv')
#view data structure
train_df_2.head()
#create a dataframe with all training data except the target column
train_X_2 = train_df_2.drop(columns=['diabetes'])
#check that the target variable has been removed
train_X_2.head()
#one-hot encode target column
train_y_2 = to_categorical(train_df_2.diabetes)
#vcheck that target column has been converted
train_y_2[0:5]
#create model
model_2 = Sequential()
#get number of columns in training data
n_cols_2 = train_X_2.shape[1]
#add layers to model
model_2.add(Dense(250, activation='relu', input_shape=(n_cols_2,)))
model_2.add(Dense(250, activation='relu'))
model_2.add(Dense(250, activation='relu'))
model_2.add(Dense(2, activation='softmax'))
#compile model using accuracy to measure model performance
model_2.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
early_stopping_monitor = EarlyStopping(patience=3)
model_2.fit(train_X_2, train_y_2, epochs=30, validation_split=0.2, callbacks=[early_stopping_monitor])
train_dft = pd.read_csv('diabetes_data - Copy.csv')
train_dft.head()
test_y_predictions = model_2.predict(train_dft)
print(test_y_predictions)
I wanted to get
[[0,1]
[1,0]]
However, I am getting
[[0.8544417 0.14555828]
[0.9312985 0.06870154]]
Additionally, can anyone explain to me what does this value 0.8544417 mean?

Actually, you may interpret the output of a model with a softmax classifier at the top as the confidence scores or probabilities of classes (because the softmax function normalizes the values such that they would be positive and have a sum of 1). So, when you provide the model with a true label of [1, 0] this means that this sample belongs to class 1 with probability of 1, and it belongs to class 2 with probability of zero. Therefore, during training the optimization process tries to get as close as possible to that label, but it would never exactly reach [1,0] (actually due to softmax it might get as close as [0.999999, 0.000001], but never [1, 0]).
But that is not a problem, because we are interested to get just close enough and know the class with the highest probability and consider that as the prediction of the model. And you can easily do that by finding the index of the class with maximum probability:
import numpy as np
preds = model.predict(some_data)
class_preds = np.argmax(preds, axis=-1) # e.g. for [max,min] it gives 0, for [min,max] it gives 1
Further, if you are interested to convert predictions to either [0,1] or [1,0] for any reason, you can just round the values:
import numpy as np
preds = model.predict(some_data)
round_preds = np.around(preds) # this would convert [0.87, 0.13] to [1., 0.]
Note: rounding only works properly with two classes, and not when you have more than two classes (e.g. [0.3, 0.4, 0.3] would become [0, 0, 0] after rounding).
Note 2: Since you are creating the model using Sequential API of Keras, then as an alternative to argmax approach described above you can directly use model.predict_classes(some_data) which gives you the exact same output.

Keras extremely high loss, not decreasing with each epoch

I'm using Keras to build and train a recurrent neural network.
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Masking
from keras.layers.recurrent import LSTM
#build and train model
in_dimension = 3
hidden_neurons = 300
out_dimension = 2
model = Sequential()
model.add(Masking([0,0,0], input_shape=(max_sequence_length, in_dimension)))
model.add(LSTM(hidden_neurons, return_sequences=True, input_shape=(max_sequence_length, in_dimension)))
model.add(LSTM(hidden_neurons, return_sequences=False))
model.add(Dense(out_dimension))
model.add(Activation('softmax'))
model.compile(loss="categorical_crossentropy", optimizer="rmsprop")
model.fit(padded_training_seqs, training_final_steps, nb_epoch=5, batch_size=1)
padded_training_seqs is an an array of sequences of [latitude, longitude, temperature], all padded to the same length with values of [0,0,0]. When I train this network, the first epoch gives me a loss of about 63, and increases after more epochs.
This is causing a model.predict call later in the code to give values that are completely off of the training values. For example, most of the training values in each sequence is around [40, 40, 20], but the RNN outputs values consistently around [0.4, 0.5], which causes me to think something is wrong with the masking layer.
The training X (padded_training_seqs) data looks something like this (only much larger):
[
[[43.103, 27.092, 19.078], [43.496, 26.746, 19.198], [43.487, 27.363, 19.092], [44.107, 27.779, 18.487], [44.529, 27.888, 17.768]],
[[44.538, 27.901, 17.756], [44.663, 28.073, 17.524], [44.623, 27.83, 17.401], [44.68, 28.034, 17.601], [0,0,0]],
[[47.236, 31.43, 13.905], [47.378, 31.148, 13.562], [0,0,0], [0,0,0], [0,0,0]]
]
and the training Y (training_final_steps) data looks like this:
[
[44.652, 39.649], [37.362, 54.106], [37.115, 57.66501]
]

I am somewhat certain that you're misusing the Masking layer from Keras. Check the documentation here for more details.
Try using a masking layer like:
model.add(Masking(0, input_shape=(max_sequence_length, in_dimension)))
because I believe it just needs the masking value in the timestep dimension, not the entire time-dimension and value (ie [0,0,0]).
Best of luck.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.