Why does my multivariate LSTM keeps predicting zeroes?

Why does my multivariate LSTM keeps predicting zeroes? - python

My time series data has 2 features:
0 1
1/22/20 555.0 17.0
1/23/20 654.0 18.0
1/24/20 941.0 26.0
1/25/20 1434.0 42.0
1/26/20 2118.0 56.0
... ... ...
5/3/20 3506729.0 247470.0
5/4/20 3583055.0 251537.0
5/5/20 3662691.0 257239.0
5/6/20 3755341.0 263831.0
5/7/20 3845718.0 269567.0
[107 rows x 2 columns]
I am trying to create a multivariate LSTM to make predictions for each of the columns. After processing the data the train and test arrays have the following shapes:
Legend: (samples, time steps, features)
x_train: (67, 4, 2)
y_train: (67, 2)
x_test: (26, 4, 2)
y_test: (26, 2)
Here is the model definition:
forecast_horizon = 4
feature_n = 2
early_stopping = EarlyStopping(patience=50, restore_best_weights=True)
model = Sequential()
model.add(LSTM(5, input_shape=(forecast_horizon, feature_n)))
model.add(Activation("relu"))
model.add(Dropout(0.1))
model.add(Dense(feature_n))
model.add(Activation("relu"))
model.compile(loss="mean_squared_error", optimizer="adam")
history = model.fit(x_train, y_train, epochs=1000, batch_size=1, verbose=0,
callbacks=[early_stopping], validation_split=0.2)
The predictions are full of zeros. The output of test_predictions = model.predict(x_test) is:
[[0.00839295 0.007538 ]
[0. 0. ]
[0.00946797 0.00663883]
[0. 0. ]
[0. 0. ]
... ...
[0.0007435 0. ]
[0.00116019 0.00032421]
[0. 0. ]
[0. 0. ]
[0. 0. ]]
When looking at the training loss it seems that the model is not learning very well.
Is this a matter of simply training the model for longer and adjusting its hyperparameters or is there something else that could be affecting this? How can I implement a proper multivariate LSTM?

A batch size of 1 means your model weights are being adjusted based on 1 observation rather than optimizing for a handful of observations. Common batch sizes are between 16 and 32 but can be adjusted depending on the model.
LSTM models also require thousands of observations, so get more training data if possible
Architectures can also vary so it's best to try a number of different approaches and see what works best. You can find more info here: https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/

Related

Keras LSTM None value output shape

this is my data X_train prepared for LSTM of shape (7000, 2, 200)
[[[0.500858 0. 0.5074856 ... 1. 0.4911533 0. ]
[0.4897923 0. 0.48860878 ... 0. 0.49446714 1. ]]
[[0.52411383 0. 0.52482396 ... 0. 0.48860878 1. ]
[0.4899698 0. 0.48819458 ... 1. 0.4968341 1. ]]
...
[[0.6124623 1. 0.6118705 ... 1. 0.6328777 0. ]
[0.6320492 0. 0.63512635 ... 1. 0.6960175 0. ]]
[[0.6118113 1. 0.6126989 ... 0. 0.63512635 1. ]
[0.63530385 1. 0.63595474 ... 1. 0.69808865 0. ]]]
I create my sequential model
model = Sequential()
model.add(LSTM(units = 50, activation = 'relu', input_shape = (X_train.shape[1], 200)))
model.add(Dropout(0.2))
model.add(Dense(1, activation = 'linear'))
model.compile(loss = 'mean_squared_error', optimizer = 'adam')
Then I fit my model:
history = model.fit(
X_train,
Y_train,
epochs = 20,
batch_size = 200,
validation_data = (X_test, Y_test),
verbose = 1,
shuffle = False,
)
model.summary()
And at the end I can see something like this:
Layer (type) Output Shape Param #
=================================================================
lstm_16 (LSTM) (None, 2, 50) 50200
dropout_10 (Dropout) (None, 2, 50) 0
dense_10 (Dense) (None, 2, 1) 51
Why does it say that output shape have a None value as a first element? Is it a problem? Or it should be like this? What does it change and how can I change it?
I will appreciate any help, thanks!

The first value in TensorFlow is always reserved for the batch-size. Your model doesn't know in advance what is your batch-size so it makes it None. If we go into more details let's suppose your dataset is 1000 samples and your batch-size is 32. So, 1000/32 will become 31.25, if we just take the floor value which is 31. So, there would be 31 batches in a total of size 32. But if you look here the total sample size of your dataset is 1000 but you have 31 batches of size 32, which is 32 * 31 = 992, where 1000 - 992 = 8, it means there would be one more batch of size 8. But the model doesn't know in advance so, what does it do? it reserves a space in the memory where it doesn't define a specific shape for it, in other words, the memory is dynamic for the batch-size. Therefore, you are seeing it None there. So, the model doesn't know in advance what would be the shape of my batch-size so it makes it None so it should know it later when it computes the first epoch meaning computes all of the batches.
The None value can't be changed because it is Dynamic in Tensorflow, the model knows it and fix it when your model completes its first epoch. So, always set the shapes which are after it like in your case it is (2, 200). The 7000 is your model's total number of samples so the model doesn't know in advance what would be your batch-size and the other big issue is most of the time your batch-size is not evenly divisible by your total number of samples in dataset therefore, it is necessary for the model to make it None to know it later when it computes all the batches in the very first epoch.

Predicting with Nan in input

I trained a (0,1) model with tensorflow but without Nans in it. Is there any way to predict some values with Nan in it. I use 'adam' as optimizer.
Making model:
input_size = 16
output_size = 2
hidden_layer_size = 50
model = tf.keras.Sequential([
tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 1st hidden layer
tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 2nd hidden layer
tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
batch_size = 100
max_epochs = 20
early_stopping=tf.keras.callbacks.EarlyStopping()
model.fit(train_inputs, # train inputs
train_targets, # train targets
batch_size=batch_size, # batch size
epochs=max_epochs, # epochs that we will train for (assuming early stopping doesn't kick in)
callbacks=[early_stopping],
validation_data=(validation_inputs, validation_targets), # validation data
verbose = 1 # making sure we get enough information about the training process
)
Potential input I'd like to add:
x=np.array([[ 0.8048038 , 2.22810658, 0.7184345 , -0.59266753, 1.73062328,
0.69392477, -1.35764524, -0.55833263, 0.10620523, 1.31206921,
-1.07966389, 1.04462389, -0.99787875, 0.797905 , -0.35954954,
np.NaN]])
The return I get:
array([[nan, nan]], dtype=float32)
So is there any way to achive it?

The optimizer needs to be able to do computations with the input. This means NaN is not a valid input for that, as there really is no good way to do anything with it in this case. You therefore have to either replace these NaNs with meaningful numbers, or you will be unable to use this data point and you will have to drop it like so:
x = x[np.isfinite(x)]

Unable to create confusion matrix after prediction in Keras - Can't handle a mix of multilabel-indicator [duplicate]

This question already has answers here:
confusion matrix error "Classification metrics can't handle a mix of multilabel-indicator and multiclass targets"
(2 answers)
Closed 2 years ago.
I want to evaluate my Keras model with confusion matrix. However, I can not make it work because I'm always getting the same error:
ValueError: Classification metrics can't handle a mix of multilabel-indicator and continuous-multioutput targets
I'm looking at this question:
confusion matrix error "Classification metrics can't handle a mix of multilabel-indicator and multiclass targets"
I have tried to mimic everything but it does not work. I think that this is not the same case.
This is my code:
validationTweets = validation.content.tolist() #data for validation
validation_features = vectorizerCV.transform(validationTweets) #vectorizing data for validation
prediction = model.predict(validation_features , batch_size=32) # making prediction
realLabels = validation.sentiment # true labels/classes (STRING VALUES)
realLabels = np.asarray(realLabels.factorize()[0]) # converting them to categorical
realLabels = to_categorical(realLabels, num_classes = 3) # converting them to categorical
print('true labels type', type(realLabels)) #<class 'numpy.ndarray'>
print('true labels shape',realLabels.shape) # (5000, 3)
print('prediction type', type(prediction)) #<class 'numpy.ndarray'>
print('prediction shape', prediction.shape) #(5000, 3)
matrix = confusion_matrix(realLabels, prediction)
This is how my real labels look like:
[[1. 0. 0.]
[1. 0. 0.]
[0. 1. 0.]
...
[0. 1. 0.]
[0. 1. 0.]
[0. 0. 1.]]
This is how my prediction looks like:
[[8.6341507e-04 6.8435425e-01 3.1478229e-01]
[8.4774427e-02 7.8772342e-01 1.2750208e-01]
[4.3412593e-01 5.0705791e-01 5.8816209e-02]
...
[9.1305929e-01 6.6390157e-02 2.0550590e-02]
[8.2271063e-01 1.5146920e-01 2.5820155e-02]
[1.7649201e-01 7.2304797e-01 1.0045998e-01]]
I have tried this:
prediction = [np.round(p, 0) for p in prediction]
ERROR: multilabel-indicator is not supported
I have also tried this:
prediction = prediction.argmax(axis = 1) # shape is (5000,)
ERROR: ValueError: Classification metrics can't handle a mix of multilabel-indicator and continuous-multioutput targets
But I'm getting the same error.

I'm not super familiar with Keras confusion_matrix, but rounding your predictions won't work for multiclass. For each sample, you need to take the highest probability predicted class, and make that entry equal to one in your prediction. For example:
pred = np.array([0.3, 0.3, 0.4])
rounded = np.round(pred) # Gives [0, 0, 0]
most_likely = np.zeros(3)
most_likely[pred >= np.max(pred)] = 1 # Gives [0,0,1]

Keras Multi-class Multi-label image classification: handle a mix of independent and dependent labels & non-binary output

I am trying to train a pre-trained VGG16 model from Keras for a multi-class multi-label classification task. The images are from Chest X-Ray 8 dataset from NIH. The dataset has 14 labels (14 diseases) plus a "no finding" label.
I understand that for independent labels, like the 14 diseases, I should use sigmoid activation + binary_crossentropy loss function; and for dependent labels, I should use softmax + categorical_crossentropy.
However, out of my 15 labels in total, 14 of them are independent, but the one "no finding" is technically dependent with the rest 14 --> the probability of "no finding" and having disease(s) should add up to 1, but the probability of having what disease(s) should be given independently. So what loss should I use?
Besides, my output is a list of floats(probabilities), each column is a label.
y_true:
[[0. 0. 0. ... 0. 0. 1.]
[0. 0. 0. ... 0. 0. 1.]
[0. 0. 1. ... 0. 0. 0.]
...
[0. 0. 0. ... 0. 0. 1.]
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 1.]]
y_predict:
[[0.1749 0.0673 0.1046 ... 0. 0. 0.112 ]
[0. 0.1067 0.2804 ... 0. 0. 0.722 ]
[0. 0. 0.0686 ... 0. 0. 0.5373]
...
[0.0571 0.0679 0.0815 ... 0. 0. 0.532 ]
[0.0723 0.0555 0.2373 ... 0. 0. 0.4263]
[0.0506 0.1305 0.4399 ... 0. 0. 0.2792]]
Such result makes it impossible to use classification_report() function to evaluate my model. I am thinking about getting a threshold to transfer it to binary, but it will be more human-modification instead of CNN prediction, as I have to select a threshold. So I am unsure whether I should do some hard-code stuff or are there any other already-exist methods to deal with this situation?
I am quite new to CNN and classification, so if anyone can guide me or give me any hint I will appreciate it very much. Thank you!
Main body code as below:
vgg16_model = VGG16()
last_layer = vgg16_model.get_layer('fc2').output
#I am treating them all as independent labels
out = Dense(15, activation='sigmoid', name='output_layer')(last_layer)
custom_vgg16_model = Model(inputs=vgg16_model.input, outputs=out)
for layer in custom_vgg16_model.layers[:-1]:
layer.trainable = False
custom_vgg16_model.compile(Adam(learning_rate=0.00001),
loss = "binary_crossentropy",
metrics = ['accuracy']) # metrics=accuracy gives me very good result,
# but I suppose it is due to the large amount
# of 0 label(not-this-disease prediction),
# therefore I am thinking to change it to
# recall and precision as metrics. If you have
# any suggestion on this I'd also like to hear!

Some update about my project, and I have actually managed to solve most of the problems mentioned in this question.
Firstly, as this is a multi-class multi-label classification question, I decided to use ROC-AUC score instead of precision or recall as the evaluation metrics. The advantage of it is that there is no threshold value involved -- AUC is the a bit like an average of performance under a range of thresholds. And it only looks at the positive prediction, so it reduces the effect of the majority of 0s in the dataset. This gives a more accurate prediction of the model's performance in my case.
For the output class, I decided to use 14 classes instead of 15 -- if all labels are 0 then it means "no finding". Then I can happily use sigmoid activation in my output layer. Despite it, I use focal loss instead of binary cross entropy as my dataset is highly imbalanced.
I still face problem as my ROC is not good (very close to y=x and sometimes below y=x). But I hope my progress can give anyone who find this some inspiration.

Tuning LSTM autoencoder performance

I am trying to build an autoencoder of a multidimensional time series. I have followed various templates around the internet and SO, but all of them focus on how to get it running, but haven't found one on how to get running and get some meaningful results.
I've followed the tutorials starting here: https://blog.keras.io/building-autoencoders-in-keras.html; a practical example here: https://machinelearningmastery.com/lstm-autoencoders/.
import numpy as np
from keras.models import Sequential
from keras.layers import LSTM, RepeatVector
import matplotlib.pyplot as plt
# this sequence comes out of a MinMaxScaler. A separate question is if this was a good idea?
sequence = np.array([[0.63306452, 0.00714286],
[0.42069892, 0. ],
[0.36155914, 0.15 ],
[0.53629032, 0.12142857],
[0.32526882, 0.24285714],
[0.26344086, 0.52142857],
[0. , 0.79285714],
[0.49731183, 0.71428571],
[0.60080645, 0.25714286],
[0.63037634, 0.11428571],
[0.70698925, 0.26428571],
[0.71774194, 0.21428571],
[0.6155914 , 0.10714286],
[0.56451613, 0.36428571],
[0.66397849, 0.2 ],
[0.76344086, 0.17857143],
[0.66801075, 0.07142857],
[0.66935484, 0.02857143],
[0.90725806, 0.32857143],
[1. , 0.28571429],
[1. , 0.4 ],
[0.81451613, 0.47857143],
[0.41532258, 0.52142857],
[0.55107527, 0.63571429],
[0.42741935, 0.40714286],
[0.56989247, 0.75 ],
[0.76075269, 0.55 ],
[0.69758065, 0.58571429],
[0.73521505, 0.89285714],
[0.77150538, 1. ]])
n_in = len(sequence)
dim_in = sequence.shape[1]
latent_dim = 10
sequence = sequence.reshape((1, n_in, dim_in))
model = Sequential()
model.add(LSTM(latent_dim, input_shape=(n_in, dim_in)))
model.add(RepeatVector(n_in))
model.add(LSTM(dim_in, return_sequences=True))
model.compile(optimizer='adam', loss='mse')
model.summary()
model.fit(sequence, sequence, epochs=1000, verbose=0)
yhat = model.predict(sequence, verbose=0)
plt.figure(1)
plt.subplot(221)
plt.plot(sequence[0, :, 0])
plt.subplot(223)
plt.plot(yhat[0, :, 0])
plt.subplot(222)
plt.plot(sequence[0, :, 1])
plt.subplot(224)
plt.plot(yhat[0, :, 1])
The result I'm getting is not satisfactory (actuals in the upper row; autoencoder output in the lower row):
The decoded series are missing important features (like the spike in the RHS or drop on the LHS. Given the 'compression ratio' of 30:10 I would expect those events to be somehow reflected. I've tried playing with epochs, batch sizes, various activations and losses.
Anything obvious I am missing?
I want to run it on a much larger sequence (5000 time points, each point of potentially high dimension). Any tips for this?
should I change my approach altogether? The Author of this blog post https://towardsdatascience.com/autoencoders-for-the-compression-of-stock-market-data-28e8c1a2da3e didn't make it work with LSTM as well...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.