this is my data X_train prepared for LSTM of shape (7000, 2, 200)
[[[0.500858 0. 0.5074856 ... 1. 0.4911533 0. ]
[0.4897923 0. 0.48860878 ... 0. 0.49446714 1. ]]
[[0.52411383 0. 0.52482396 ... 0. 0.48860878 1. ]
[0.4899698 0. 0.48819458 ... 1. 0.4968341 1. ]]
...
[[0.6124623 1. 0.6118705 ... 1. 0.6328777 0. ]
[0.6320492 0. 0.63512635 ... 1. 0.6960175 0. ]]
[[0.6118113 1. 0.6126989 ... 0. 0.63512635 1. ]
[0.63530385 1. 0.63595474 ... 1. 0.69808865 0. ]]]
I create my sequential model
model = Sequential()
model.add(LSTM(units = 50, activation = 'relu', input_shape = (X_train.shape[1], 200)))
model.add(Dropout(0.2))
model.add(Dense(1, activation = 'linear'))
model.compile(loss = 'mean_squared_error', optimizer = 'adam')
Then I fit my model:
history = model.fit(
X_train,
Y_train,
epochs = 20,
batch_size = 200,
validation_data = (X_test, Y_test),
verbose = 1,
shuffle = False,
)
model.summary()
And at the end I can see something like this:
Layer (type) Output Shape Param #
=================================================================
lstm_16 (LSTM) (None, 2, 50) 50200
dropout_10 (Dropout) (None, 2, 50) 0
dense_10 (Dense) (None, 2, 1) 51
Why does it say that output shape have a None value as a first element? Is it a problem? Or it should be like this? What does it change and how can I change it?
I will appreciate any help, thanks!
The first value in TensorFlow is always reserved for the batch-size. Your model doesn't know in advance what is your batch-size so it makes it None. If we go into more details let's suppose your dataset is 1000 samples and your batch-size is 32. So, 1000/32 will become 31.25, if we just take the floor value which is 31. So, there would be 31 batches in a total of size 32. But if you look here the total sample size of your dataset is 1000 but you have 31 batches of size 32, which is 32 * 31 = 992, where 1000 - 992 = 8, it means there would be one more batch of size 8. But the model doesn't know in advance so, what does it do? it reserves a space in the memory where it doesn't define a specific shape for it, in other words, the memory is dynamic for the batch-size. Therefore, you are seeing it None there. So, the model doesn't know in advance what would be the shape of my batch-size so it makes it None so it should know it later when it computes the first epoch meaning computes all of the batches.
The None value can't be changed because it is Dynamic in Tensorflow, the model knows it and fix it when your model completes its first epoch. So, always set the shapes which are after it like in your case it is (2, 200). The 7000 is your model's total number of samples so the model doesn't know in advance what would be your batch-size and the other big issue is most of the time your batch-size is not evenly divisible by your total number of samples in dataset therefore, it is necessary for the model to make it None to know it later when it computes all the batches in the very first epoch.
Related
I am finding output of batchnormalization in Keras.
My model is:
#Import libraries
import numpy as np
import keras
from keras import layers
from keras.layers import Input, Dense, Activation, BatchNormalization, Flatten, Conv2D
from keras.models import Model
#Model
def HappyModel3(input_shape):
X_input = Input(input_shape, name='input_layer')
X = BatchNormalization(axis = 1, name = 'batchnorm_layer')(X_input)
X = Dense(1, activation='sigmoid', name='sigmoid_layer')(X)
model = Model(inputs = X_input, outputs = X, name='HappyModel3')
return model
Compiling Model | here number of epochs is 1
X_train=np.array([[1,1,-1],[2,1,1]])
Y_train=np.array([0,1])
happyModel_1=HappyModel3(X_train[0].shape)
happyModel_1.compile(optimizer=keras.optimizers.RMSprop(), loss=keras.losses.mean_squared_error)
happyModel_1.fit(x = X_train, y = Y_train, epochs = 1 , batch_size = 2, verbose=0 )
finding Batch Normalisation layer's output for model with epochs=1:
for i in range(0, len(happyModel_1.layers)):
tmp_model = Model(happyModel_1.layers[0].input, happyModel_1.layers[i].output)
tmp_output = tmp_model.predict(X_train)
if i in (0,1) :
print(happyModel_1.layers[i].name)
print(tmp_output.shape)
print(tmp_output)
print('\n')
Code Output is:
input_layer
(2, 3)
[[ 1. 1. -1.]
[ 2. 1. 1.]]
batchnorm_layer
(2, 3)
[[ 0.99003249 0.99388224 -0.99551398]
[ 1.99647105 0.99388224 0.9971655 ]]
We've normalized at axis=1 |
Batch Norm Layer Output: At axis=1, 1st dimension mean is 1.5, 2nd dimension mean is 1, 3rd dimension mean is 0.
Since its batch norm, I expect mean to be close to 0 for all 3 dimensions
This happens when I increase epochs to 1000:
happyModel_2=HappyModel3(X_train[0].shape)
happyModel_2.compile(optimizer=keras.optimizers.RMSprop(), loss=keras.losses.mean_squared_error)
happyModel_2.fit(x = X_train, y = Y_train, epochs = 1000 , batch_size = 2, verbose=0 )
finding Batch Normalisation layer's output for model with epochs=1000:
for i in range(0, len(happyModel_2.layers)):
tmp_model = Model(happyModel_2.layers[0].input, happyModel_2.layers[i].output)
tmp_output = tmp_model.predict(X_train)
if i in (0,1) :
print(happyModel_2.layers[i].name)
print(tmp_output.shape)
print(tmp_output)
print('\n')
#Code output
input_layer
(2, 3)
[[ 1. 1. -1.]
[ 2. 1. 1.]]
batchnorm_layer
(2, 3)
[[ -1.95576239e+00 8.08715820e-04 -1.86621261e+00]
[ 1.95795488e+00 8.08715820e-04 1.86590290e+00]]
We've normalized at axis=1 | Now At axis=1, batch norm layer output is: 1st dimension mean is 0, 2nd dimension mean is 0, 3rd dimension mean is 0. THIS IS AN EXPECTED OUTPUT NOW
My question is: Is output of Batch Normalization in Keras dependent on number of epochs?
(Probably YES, as we do backpropagation, batch Normalization parameters will be affected by increasing number of epochs)
The keras documentation for BatchNormalization gives an answer to your question:
Importantly, batch normalization works differently during training and
during inference.
What happens during training, i.e. when calling model.fit()?
During training [...], the layer normalizes its output
using the mean and standard deviation of the current batch of inputs.
But what will happen during inference, i.e. when calling mode.predict() as in your examples?
During inference [...], the layer normalizes its output using a moving average of
the mean and standard deviation of the batches it has seen during
training. That is to say, it returns (batch - self.moving_mean) / (self.moving_var + epsilon) * gamma + beta.
self.moving_mean and self.moving_var are non-trainable variables that
are updated each time the layer in called in training mode [...].
It's important to understand that batch normalization will calculate the statistics (mean and variance) of your whole training data during training by looking at statistics of single batches and internally updating the moving_mean and moving_variance parameters by a running average computed form the single batch statistics. Therefore they're not affected by backpropagation. Ideally, after your model has seen enough training examples (or did enough training epochs), moving_mean and moving_variance will correspond to the statistics of your whole training set. These two parameters are then used during inference to normalize test examples. At the start of training the two parameters will be initialized to 0 and 1. Further batch norm has two more parameters called gamma and beta, which will be updated by the optimizer and therefore depend on your loss.
In essence, yes, the output of batch normalization during inference is dependent on the number of epochs you have trained your model. Firstly, due to changing moving averages for mean and variance and second due to learned parameters gamma and beta.
For a deeper understanding of how batch normalization works and why it is needed, have a look at the original publication.
I trained a (0,1) model with tensorflow but without Nans in it. Is there any way to predict some values with Nan in it. I use 'adam' as optimizer.
Making model:
input_size = 16
output_size = 2
hidden_layer_size = 50
model = tf.keras.Sequential([
tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 1st hidden layer
tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 2nd hidden layer
tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
batch_size = 100
max_epochs = 20
early_stopping=tf.keras.callbacks.EarlyStopping()
model.fit(train_inputs, # train inputs
train_targets, # train targets
batch_size=batch_size, # batch size
epochs=max_epochs, # epochs that we will train for (assuming early stopping doesn't kick in)
callbacks=[early_stopping],
validation_data=(validation_inputs, validation_targets), # validation data
verbose = 1 # making sure we get enough information about the training process
)
Potential input I'd like to add:
x=np.array([[ 0.8048038 , 2.22810658, 0.7184345 , -0.59266753, 1.73062328,
0.69392477, -1.35764524, -0.55833263, 0.10620523, 1.31206921,
-1.07966389, 1.04462389, -0.99787875, 0.797905 , -0.35954954,
np.NaN]])
The return I get:
array([[nan, nan]], dtype=float32)
So is there any way to achive it?
The optimizer needs to be able to do computations with the input. This means NaN is not a valid input for that, as there really is no good way to do anything with it in this case. You therefore have to either replace these NaNs with meaningful numbers, or you will be unable to use this data point and you will have to drop it like so:
x = x[np.isfinite(x)]
I would like to create a 'Sequential' model (a Time Series model as you might have guessed), that takes 20 days of past data with a feature size of 2, and predict 1 day into the future with the same feature size of 2.
I found out you need to specify the batch size for a stateful LSTM model, so if I specify a batch size of 32 for example, the final output shape of the model is (32, 2), which I think means the model is predicting 32 days into the future rathen than 1.
How would I go on fixing it?
Also, asking before I arrive to the problem; if I specify a batch size of 32 for example, but I want to predict on an input of shape (1, 20, 2), would the model predict correctly or what, since I changed to batch size from 32 to 1. Thank you.
You don't need to specify batch_size. But you should feed 3-d tensor:
import tensorflow as tf
from tensorflow.keras.layers import Input, LSTM, Dense
from tensorflow.keras import Model, Sequential
features = 2
dim = 128
new_model = Sequential([
LSTM(dim, stateful=True, return_sequences = True),
Dense(2)
])
number_of_sequences = 1000
sequence_length = 20
input = tf.random.uniform([number_of_sequences, sequence_length, features], dtype=tf.float32)
output = new_model(input) # shape is (number_of_sequences, sequence_length, features)
predicted = output[:,-1] # shape is (number_of_sequences, 1, features)
Shape of (32, 2) means that your sequence length is 32.
Batch size is a parameter of training (how many sequences should be feeded to the model before backpropagating error - see stochastic graient descent method). It doesn't affect your data (which shoud be 3-d - (number of sequences, length of sequence, feature)).
If you need to predict only one sequence - just feed tensor of shape (1, 20, 2) to the model.
My time series data has 2 features:
0 1
1/22/20 555.0 17.0
1/23/20 654.0 18.0
1/24/20 941.0 26.0
1/25/20 1434.0 42.0
1/26/20 2118.0 56.0
... ... ...
5/3/20 3506729.0 247470.0
5/4/20 3583055.0 251537.0
5/5/20 3662691.0 257239.0
5/6/20 3755341.0 263831.0
5/7/20 3845718.0 269567.0
[107 rows x 2 columns]
I am trying to create a multivariate LSTM to make predictions for each of the columns. After processing the data the train and test arrays have the following shapes:
Legend: (samples, time steps, features)
x_train: (67, 4, 2)
y_train: (67, 2)
x_test: (26, 4, 2)
y_test: (26, 2)
Here is the model definition:
forecast_horizon = 4
feature_n = 2
early_stopping = EarlyStopping(patience=50, restore_best_weights=True)
model = Sequential()
model.add(LSTM(5, input_shape=(forecast_horizon, feature_n)))
model.add(Activation("relu"))
model.add(Dropout(0.1))
model.add(Dense(feature_n))
model.add(Activation("relu"))
model.compile(loss="mean_squared_error", optimizer="adam")
history = model.fit(x_train, y_train, epochs=1000, batch_size=1, verbose=0,
callbacks=[early_stopping], validation_split=0.2)
The predictions are full of zeros. The output of test_predictions = model.predict(x_test) is:
[[0.00839295 0.007538 ]
[0. 0. ]
[0.00946797 0.00663883]
[0. 0. ]
[0. 0. ]
... ...
[0.0007435 0. ]
[0.00116019 0.00032421]
[0. 0. ]
[0. 0. ]
[0. 0. ]]
When looking at the training loss it seems that the model is not learning very well.
Is this a matter of simply training the model for longer and adjusting its hyperparameters or is there something else that could be affecting this? How can I implement a proper multivariate LSTM?
A batch size of 1 means your model weights are being adjusted based on 1 observation rather than optimizing for a handful of observations. Common batch sizes are between 16 and 32 but can be adjusted depending on the model.
LSTM models also require thousands of observations, so get more training data if possible
Architectures can also vary so it's best to try a number of different approaches and see what works best. You can find more info here: https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
I'm using a custom training loop. The loss that is returned by tf.keras.losses.categorical_crossentropy is an array of I'm assuming (1,batch_size). Is this what it is supposed to return or a single value?
In the latter case, any idea what I could be doing wrong?
If you have a prediction shape of (samples of batch, classes) tf.keras.losses.categorical_crossentropy returns the losses in the shape of (samples of batch,).
So, if your labels are:
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
And your predictions are:
[[0.9 0.05 0.05]
[0.5 0.89 0.6 ]
[0.05 0.01 0.94]]
You will get a loss like:
[0.10536055 0.8046684 0.06187541]
In most case your model will use these value's mean for the update of your model parameters. So if you manually do the updates you can use:
loss = tf.keras.backend.mean(losses)
Most usual losses return the original shape minus the last axis.
So, if your original y_pred shape was (samples, ..., ..., classes), then your resulting shape will be (samples, ..., ...).
This is probably because Keras may use this tensor in further calculations, for sample weights and maybe other things.
In a custom loop, if these dimensions are useless, you can simply take a K.mean(loss_result) before calculating the gradients. (Where K is either keras.backend or tensorflow.keras.backend)