AutoEncoder feature layers is unstable

AutoEncoder feature layers is unstable - python

I am replicating a linear autoencoder method based on this example here; https://towardsdatascience.com/build-the-right-autoencoder-tune-and-optimize-using-pca-principles-part-ii-24b9cca69bd6
Basically, it is using a one layer linear autoencoder to compare with PCA. X is randomly generated from a normal distribution with dimension 5. The core part of the code looks like this.
encoder = Dense(encoding_dim, activation="linear", input_shape=(input_dim,), use_bias = True)
decoder = Dense(input_dim, activation="linear", use_bias = True)
autoencoder = Sequential()
autoencoder.add(encoder)
autoencoder.add(decoder)
autoencoder.compile(metrics=['accuracy'],
loss='mean_squared_error',
optimizer='sgd')
autoencoder.summary()
autoencoder.fit(X_train_scaled, X_train_scaled,
epochs=nb_epoch,
batch_size=batch_size,
shuffle=True,
verbose=0)
The feature layer is
w_encoder = autoencoder.layers[0].get_weights()[1]
w_decoder = autoencoder.layers[1].get_weights()[1]
Every time I fit the model, the output for w_encoder is significantly different
Encoder_weights
[[ 0.5596451 -0.7303996 ]
[-0.08105161 0.43715334]
[ 0.7571198 0.4995086 ]
[-0.68543106 0.0496945 ]
[-0.46657953 0.1231109 ]]
Decoder_weights
[[ 0.5596451 -0.7303996 ]
[-0.08105161 0.43715334]
[ 0.7571198 0.4995086 ]
[-0.68543106 0.0496945 ]
[-0.46657953 0.1231109 ]]
vs
Encoder_weights
[[ 0.49870995 -0.594432 ]
[-0.03552848 0.3591121 ]
[ 0.6754906 0.42547104]
[-0.5236658 0.02657888]
[-0.36780515 0.07721919]]
Decoder_weights
[[ 0.49870995 -0.594432 ]
[-0.03552848 0.3591121 ]
[ 0.6754906 0.42547104]
[-0.5236658 0.02657888]
[-0.36780515 0.07721919]]
Is there any way to make the layer metrics stable between runs?

This can happen for many reasons.
The weight of your network is initialized randomly each time, so it's possible to get different results each time you run.
Your data loader is random in nature and randomly pulling samples.
If you want reproducibility, try the following,
Use seeds.
SEED = 1997
os.environ['PYTHONHASHSEED']=str(SEED)
random.seed(SEED)
np.random.seed(SEED)
tf.random.set_seed(SEED)
In the case of running on an Nvidia GPU, you should also use tensorflow-determinism
pip install tensorflow-determinism
and you use it like this:
import tensorflow as tf
import os
os.environ['TF_DETERMINISTIC_OPS'] = '1'
For Tensorflow < 2.1, add above and this:
from tfdeterminism import patch
patch()

Related

Is output of Batch Normalization in Keras dependent on number of epochs?

I am finding output of batchnormalization in Keras.
My model is:
#Import libraries
import numpy as np
import keras
from keras import layers
from keras.layers import Input, Dense, Activation, BatchNormalization, Flatten, Conv2D
from keras.models import Model
#Model
def HappyModel3(input_shape):
X_input = Input(input_shape, name='input_layer')
X = BatchNormalization(axis = 1, name = 'batchnorm_layer')(X_input)
X = Dense(1, activation='sigmoid', name='sigmoid_layer')(X)
model = Model(inputs = X_input, outputs = X, name='HappyModel3')
return model
Compiling Model | here number of epochs is 1
X_train=np.array([[1,1,-1],[2,1,1]])
Y_train=np.array([0,1])
happyModel_1=HappyModel3(X_train[0].shape)
happyModel_1.compile(optimizer=keras.optimizers.RMSprop(), loss=keras.losses.mean_squared_error)
happyModel_1.fit(x = X_train, y = Y_train, epochs = 1 , batch_size = 2, verbose=0 )
finding Batch Normalisation layer's output for model with epochs=1:
for i in range(0, len(happyModel_1.layers)):
tmp_model = Model(happyModel_1.layers[0].input, happyModel_1.layers[i].output)
tmp_output = tmp_model.predict(X_train)
if i in (0,1) :
print(happyModel_1.layers[i].name)
print(tmp_output.shape)
print(tmp_output)
print('\n')
Code Output is:
input_layer
(2, 3)
[[ 1. 1. -1.]
[ 2. 1. 1.]]
batchnorm_layer
(2, 3)
[[ 0.99003249 0.99388224 -0.99551398]
[ 1.99647105 0.99388224 0.9971655 ]]
We've normalized at axis=1 |
Batch Norm Layer Output: At axis=1, 1st dimension mean is 1.5, 2nd dimension mean is 1, 3rd dimension mean is 0.
Since its batch norm, I expect mean to be close to 0 for all 3 dimensions
This happens when I increase epochs to 1000:
happyModel_2=HappyModel3(X_train[0].shape)
happyModel_2.compile(optimizer=keras.optimizers.RMSprop(), loss=keras.losses.mean_squared_error)
happyModel_2.fit(x = X_train, y = Y_train, epochs = 1000 , batch_size = 2, verbose=0 )
finding Batch Normalisation layer's output for model with epochs=1000:
for i in range(0, len(happyModel_2.layers)):
tmp_model = Model(happyModel_2.layers[0].input, happyModel_2.layers[i].output)
tmp_output = tmp_model.predict(X_train)
if i in (0,1) :
print(happyModel_2.layers[i].name)
print(tmp_output.shape)
print(tmp_output)
print('\n')
#Code output
input_layer
(2, 3)
[[ 1. 1. -1.]
[ 2. 1. 1.]]
batchnorm_layer
(2, 3)
[[ -1.95576239e+00 8.08715820e-04 -1.86621261e+00]
[ 1.95795488e+00 8.08715820e-04 1.86590290e+00]]
We've normalized at axis=1 | Now At axis=1, batch norm layer output is: 1st dimension mean is 0, 2nd dimension mean is 0, 3rd dimension mean is 0. THIS IS AN EXPECTED OUTPUT NOW
My question is: Is output of Batch Normalization in Keras dependent on number of epochs?
(Probably YES, as we do backpropagation, batch Normalization parameters will be affected by increasing number of epochs)

The keras documentation for BatchNormalization gives an answer to your question:
Importantly, batch normalization works differently during training and
during inference.
What happens during training, i.e. when calling model.fit()?
During training [...], the layer normalizes its output
using the mean and standard deviation of the current batch of inputs.
But what will happen during inference, i.e. when calling mode.predict() as in your examples?
During inference [...], the layer normalizes its output using a moving average of
the mean and standard deviation of the batches it has seen during
training. That is to say, it returns (batch - self.moving_mean) / (self.moving_var + epsilon) * gamma + beta.
self.moving_mean and self.moving_var are non-trainable variables that
are updated each time the layer in called in training mode [...].
It's important to understand that batch normalization will calculate the statistics (mean and variance) of your whole training data during training by looking at statistics of single batches and internally updating the moving_mean and moving_variance parameters by a running average computed form the single batch statistics. Therefore they're not affected by backpropagation. Ideally, after your model has seen enough training examples (or did enough training epochs), moving_mean and moving_variance will correspond to the statistics of your whole training set. These two parameters are then used during inference to normalize test examples. At the start of training the two parameters will be initialized to 0 and 1. Further batch norm has two more parameters called gamma and beta, which will be updated by the optimizer and therefore depend on your loss.
In essence, yes, the output of batch normalization during inference is dependent on the number of epochs you have trained your model. Firstly, due to changing moving averages for mean and variance and second due to learned parameters gamma and beta.
For a deeper understanding of how batch normalization works and why it is needed, have a look at the original publication.

Cannot understand issue! Input tensor must be at least 2D

I have written this simple program to make a prediction (do not mind that there is no train/test split). x is a 2D input array of size 40k, 4.
import numpy as np
from numpy import load
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
tf.get_logger().setLevel('INFO')
tf.autograph.set_verbosity(1)
x = load('dataset/metadata/x.npy')
y = load('dataset/metadata/y.npy')
meta_model = keras.Sequential(
[
layers.Dense(3, activation='relu'),
layers.Dense(2, activation='relu'),
layers.Dense(1)
]
)
meta_model.compile(
loss=keras.losses.MeanSquaredError(),
optimizer=keras.optimizers.Adam(lr=0.001),
metrics=[tf.keras.metrics.MeanSquaredError()]
)
meta_model.fit(x, y, batch_size=25, epochs=10, verbose=2)
for i in range (10):
print(y[i], " vs ", meta_model(x[i]))
In the final few lines I am attempting to make the model output a prediction (also I am aware that the prediction is happening on the same data which it is using to learn, I am simply trying to get the model to work). I cannot understand why I am getting the following error (on the last line):
Input tensor must be at least 2D: [3]
Can anyone help explain what I am doing incorrectly?

Tuning LSTM autoencoder performance

I am trying to build an autoencoder of a multidimensional time series. I have followed various templates around the internet and SO, but all of them focus on how to get it running, but haven't found one on how to get running and get some meaningful results.
I've followed the tutorials starting here: https://blog.keras.io/building-autoencoders-in-keras.html; a practical example here: https://machinelearningmastery.com/lstm-autoencoders/.
import numpy as np
from keras.models import Sequential
from keras.layers import LSTM, RepeatVector
import matplotlib.pyplot as plt
# this sequence comes out of a MinMaxScaler. A separate question is if this was a good idea?
sequence = np.array([[0.63306452, 0.00714286],
[0.42069892, 0. ],
[0.36155914, 0.15 ],
[0.53629032, 0.12142857],
[0.32526882, 0.24285714],
[0.26344086, 0.52142857],
[0. , 0.79285714],
[0.49731183, 0.71428571],
[0.60080645, 0.25714286],
[0.63037634, 0.11428571],
[0.70698925, 0.26428571],
[0.71774194, 0.21428571],
[0.6155914 , 0.10714286],
[0.56451613, 0.36428571],
[0.66397849, 0.2 ],
[0.76344086, 0.17857143],
[0.66801075, 0.07142857],
[0.66935484, 0.02857143],
[0.90725806, 0.32857143],
[1. , 0.28571429],
[1. , 0.4 ],
[0.81451613, 0.47857143],
[0.41532258, 0.52142857],
[0.55107527, 0.63571429],
[0.42741935, 0.40714286],
[0.56989247, 0.75 ],
[0.76075269, 0.55 ],
[0.69758065, 0.58571429],
[0.73521505, 0.89285714],
[0.77150538, 1. ]])
n_in = len(sequence)
dim_in = sequence.shape[1]
latent_dim = 10
sequence = sequence.reshape((1, n_in, dim_in))
model = Sequential()
model.add(LSTM(latent_dim, input_shape=(n_in, dim_in)))
model.add(RepeatVector(n_in))
model.add(LSTM(dim_in, return_sequences=True))
model.compile(optimizer='adam', loss='mse')
model.summary()
model.fit(sequence, sequence, epochs=1000, verbose=0)
yhat = model.predict(sequence, verbose=0)
plt.figure(1)
plt.subplot(221)
plt.plot(sequence[0, :, 0])
plt.subplot(223)
plt.plot(yhat[0, :, 0])
plt.subplot(222)
plt.plot(sequence[0, :, 1])
plt.subplot(224)
plt.plot(yhat[0, :, 1])
The result I'm getting is not satisfactory (actuals in the upper row; autoencoder output in the lower row):
The decoded series are missing important features (like the spike in the RHS or drop on the LHS. Given the 'compression ratio' of 30:10 I would expect those events to be somehow reflected. I've tried playing with epochs, batch sizes, various activations and losses.
Anything obvious I am missing?
I want to run it on a much larger sequence (5000 time points, each point of potentially high dimension). Any tips for this?
should I change my approach altogether? The Author of this blog post https://towardsdatascience.com/autoencoders-for-the-compression-of-stock-market-data-28e8c1a2da3e didn't make it work with LSTM as well...

Rounding Error at a python neural network made by Keras

I learning python by using of the Keras high-level neural networks library.
I made a simple neural network with only one neuron for a linear classification problem for 2 inputs and one outputs.
My network is working well, but if I want to calculate the prediction self (to demonstrate the working of the NN), using the weights of the network, there will be a little difference between the own "implementation" of neural network's firing and Keras's. E.g.:
Using predict() method on trained network:
testset = np.array([[5],[1]])
prediction = model.predict(testset.transpose())
print prediction
The result is:
[[ 0.22708023]]
Calculate the result self:
# get the weights of the neural network
for layer in model.layers:
weights = layer.get_weights()
# the math of the prediction
prediction_calc = testset[0]*weights[0][0]+testset[1]*weights[0][1]+1*weights[1]
print prediction_calc
The reusult is:
[ 0.22708024]
Why is this little difference between the two values? The neural network should do more as I made at calculating the prediction_calc variable.
I think it should be something with casting between variables. If I print the weights variable, I see, it is a matrices of float32 values.
[array([[-0.07256483],
[ 0.02924729]], dtype=float32), array([ 0.56065708], dtype=float32)]
I don't find why is this difference, it should be a simple rounding error, but I don't know how to avoid it.
For some help here is the whole code:
import numpy as np
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)
# load and transpose input data set
inputset = np.loadtxt("learningsets/hyperplane2d_INPUTS.csv", delimiter=",")
inputset = inputset.transpose()
# load output data set
outputset = np.loadtxt("learningsets/hyperplane2d_OUTPUTS.csv", delimiter=",")
outputset = outputset
# build the simple NN
from keras.models import Sequential
model = Sequential()
from keras.layers import Dense
# The neuron
# Dense: 2 inputs, 1 outputs . Linear activation
model.add(Dense(output_dim=1, input_dim=2, activation="linear"))
model.compile(loss='mean_squared_error', metrics=['accuracy'], optimizer='sgd')
model.fit(inputset, outputset, nb_epoch=20, batch_size=10)
# calculate prediction for one input
testset = np.array([[5],[1]])
predictions = model.predict(testset.transpose())
print predictions
# get the weights of the neural network
for layer in model.layers:
weights = layer.get_weights()
# the math of the prediction
prediction_calc = testset[0]*weights[0][0]+testset[1]*weights[0][1]+1*weights[1]
print prediction_calc

Keras extremely high loss, not decreasing with each epoch

I'm using Keras to build and train a recurrent neural network.
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Masking
from keras.layers.recurrent import LSTM
#build and train model
in_dimension = 3
hidden_neurons = 300
out_dimension = 2
model = Sequential()
model.add(Masking([0,0,0], input_shape=(max_sequence_length, in_dimension)))
model.add(LSTM(hidden_neurons, return_sequences=True, input_shape=(max_sequence_length, in_dimension)))
model.add(LSTM(hidden_neurons, return_sequences=False))
model.add(Dense(out_dimension))
model.add(Activation('softmax'))
model.compile(loss="categorical_crossentropy", optimizer="rmsprop")
model.fit(padded_training_seqs, training_final_steps, nb_epoch=5, batch_size=1)
padded_training_seqs is an an array of sequences of [latitude, longitude, temperature], all padded to the same length with values of [0,0,0]. When I train this network, the first epoch gives me a loss of about 63, and increases after more epochs.
This is causing a model.predict call later in the code to give values that are completely off of the training values. For example, most of the training values in each sequence is around [40, 40, 20], but the RNN outputs values consistently around [0.4, 0.5], which causes me to think something is wrong with the masking layer.
The training X (padded_training_seqs) data looks something like this (only much larger):
[
[[43.103, 27.092, 19.078], [43.496, 26.746, 19.198], [43.487, 27.363, 19.092], [44.107, 27.779, 18.487], [44.529, 27.888, 17.768]],
[[44.538, 27.901, 17.756], [44.663, 28.073, 17.524], [44.623, 27.83, 17.401], [44.68, 28.034, 17.601], [0,0,0]],
[[47.236, 31.43, 13.905], [47.378, 31.148, 13.562], [0,0,0], [0,0,0], [0,0,0]]
]
and the training Y (training_final_steps) data looks like this:
[
[44.652, 39.649], [37.362, 54.106], [37.115, 57.66501]
]

I am somewhat certain that you're misusing the Masking layer from Keras. Check the documentation here for more details.
Try using a masking layer like:
model.add(Masking(0, input_shape=(max_sequence_length, in_dimension)))
because I believe it just needs the masking value in the timestep dimension, not the entire time-dimension and value (ie [0,0,0]).
Best of luck.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.