How to improve LSTM model predictions and accuracy? - python

After creating pre-embedded layer using gensim my val_accuracy has gone down to 45% for 4600 records:-
model = models.Sequential()
model.add(Embedding(input_dim=MAX_NB_WORDS, output_dim=EMBEDDING_DIM,
weights=[embedding_model],trainable=False,
input_length=seq_len,mask_zero=True))
#model.add(SpatialDropout1D(0.2))
#model.add(Embedding(vocabulary_size, 64))
model.add(GRU(units=150, return_sequences=True))
model.add(Dropout(0.4))
model.add(LSTM(units=200,dropout=0.4))
#model.add(Dropout(0.8))
#model.add(LSTM(100))
#model.add(Dropout(0.4))
#Bidirectional(tf.keras.layers.LSTM(embedding_dim))
#model.add(LSTM(400,input_shape=(1117, 100),return_sequences=True))
#model.add(Bidirectional(LSTM(128)))
model.add(Dense(100, activation='relu'))
#
#model.add(Dropout(0.4))
#model.add(Dense(200, activation='relu'))
model.add(Dense(4, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop',
metrics=['accuracy'])
Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_4 (Embedding) (None, 50, 100) 2746300
_________________________________________________________________
gru_4 (GRU) (None, 50, 150) 112950
_________________________________________________________________
dropout_4 (Dropout) (None, 50, 150) 0
_________________________________________________________________
lstm_4 (LSTM) (None, 200) 280800
_________________________________________________________________
dense_7 (Dense) (None, 100) 20100
_________________________________________________________________
dense_8 (Dense) (None, 4) 404
=================================================================
Total params: 3,160,554
Trainable params: 414,254
Non-trainable params: 2,746,300
_________________________________________________________________
Full code is at
https://colab.research.google.com/drive/13N94kBKkHIX2TR5B_lETyuH1QTC5VuRf?usp=sharing
It would be great help for me.Since i am new in deep learning and i tried almost everything i knew.But now am all blank.

The problem is with your input. You've padded your input sequences with zeros but have not provided this information to your model. So your model doesn't ignore the zeros which is the reason it's not learning at all. To resolve this, change your embedding layer as follows:
model.add(layers.Embedding(input_dim=vocab_size+1,
output_dim=embedding_dim,
mask_zero=True))
This will enable your model to ignore the zero padding and learn. Training with this, I got a training accuracy of 100% in just 6 epochs though validation accuracy wasn't that good (aroung 54%) which is expected as your training data contains only 32 examples. More about embedding layer: https://keras.io/api/layers/core_layers/embedding/
Since your dataset is small, the model tends to overfit on training data quite easily which gives lower validation accuracy. To mitigate this to some extent, you can try using pre-trained word embeddings like word2vec or GloVe instead of training your own embedding layer. Also, try some text data augmentation methods like creating artificial data using templates or replacing words in training data with their synonyms. You can also experiment with different types of layers (like replacing GRU with another LSTM) but in my opinion that may not help much here and should be considered after trying out pre-trained embeddings and data augmentation.

Related

Why LSTM predicts the same value for all test cases?

I have a regression task and time series data. For each observation I need to predict one outcome value. My data is a series of images. I have hand-crafted 32 features from my images. Images have 10 channels. My data has 4D shape: (observations, time steps, channels, features), e.g. (3348, 121, 10, 32). After normalisation one channel for one observation looks like this:
matplotlib.pyplot.matshow(normalized[170,:,0,:].transpose())
The figure shows 121 time steps (x-axis) and each time step has features on rows (32). The intensity of feature value is shown in colors. So there seems to be something happening in time.
Question 1: How to apply RNN in such a task?
Could I somehow use CNN to extract information from my 3rd and 4th axis of my data (as shown in figure above)?
A proposed solution (and troubles ahead):
I have flattened the data into 3D. I think I have sadly degenerated information for the learner, but at least this (almost) works:
m,n = xtrain4d.shape[:2]
xtrain3d = xtrain4d.reshape(m,n,-1)
Shape is now: (3348, 121, 320). Target ytrain.shape is (3348,).
Here's my LSTM:
def LSTMrecurrentNN(shape1, shape2):
model = Sequential()
model.add(LSTM(128, return_sequences=True, input_shape=(shape1, shape2)))
model.add(Dropout(0.5))
model.add(Dense(64, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(16, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='linear'))
return model
model = LSTMrecurrentNN(xtrain3d.shape[1], xtrain3d.shape[2])
model.summary()
Model: "sequential_12"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_11 (LSTM) (None, 121, 128) 229888
_________________________________________________________________
dropout_18 (Dropout) (None, 121, 128) 0
_________________________________________________________________
dense_21 (Dense) (None, 121, 64) 8256
_________________________________________________________________
dropout_19 (Dropout) (None, 121, 64) 0
_________________________________________________________________
dense_22 (Dense) (None, 121, 16) 1040
_________________________________________________________________
dropout_20 (Dropout) (None, 121, 16) 0
_________________________________________________________________
dense_23 (Dense) (None, 121, 1) 17
=================================================================
Total params: 239,201
Trainable params: 239,201
Non-trainable params: 0
_________________________________________________________________
Running the model:
epochs = 20
batchsize=128
learningrate=0.001
epsilon=0.1
# monitor validation progress:
early = EarlyStopping(monitor = "val_loss", mode = "min", patience = 10)
callbacks_list = [early]
# compile:
model.compile(loss = 'mean_squared_error',
optimizer = Adam(learning_rate=learningrate, epsilon = epsilon),
metrics = ['mse'])
# and train the model
history = model.fit(xtrain3d, ytrain,
epochs=epochs, batch_size=batchsize, verbose=0,
validation_split = 0.20,
callbacks = callbacks_list)
# predict:
test_predictions = model.predict(Xtest)
Training and validation performance looks ok:
But on the test set the model predicts one value for all observations! The figure below shows how for 10 observations in the test set the model predicts already in the early time steps a value of 3267 that is close to the mean of target y.
Statistics tell the same:
scipy.stats.describe(test_predictions[:,-1,0])
DescribeResult(nobs=1544, minmax=(3267.813, 3267.813),
mean=3267.8127, variance=5.964328e-08, skewness=1.0, kurtosis=-2.0)
For target y:
scipy.stats.describe(ytest)
DescribeResult(nobs=1544, minmax=(0.0, 8000.0),
mean=3312.1081606217617,
variance=1381985.8476585718, skewness=0.2847730511366937, kurtosis=0.20894280037919222)
Question 2: Why the model predicts the same value for all?
Any hints, how to check LSTM behaviour (states)? I would like to know how far back it "remembers".

How do i correctly shape my input data for a keras model?

I'm currently working on a Keras neural network for fun. I'm just learning the basics, but cant get over this dimension problem:
So my input data (X) should be a 12x6 matrix, with 12 timestamps and 6 different data values for every timestamp:
X = np.zeros([2867, 12, 6])
Y = np.zeros([2867, 3])
My Output (Y) should be a one-hot encoded 3x1 vector.
Now i want to feed this data through the following LSTM model.
model = Sequential()
model.add(LSTM(30, activation="softsign", return_sequences=True, input_shape=(12, 6)))
model.add(Dense(3))
model.summary()
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x=X, y=Y, batch_size=100, epochs=1000, verbose=2, validation_split=0.2)
The Summary looks like this:
Model: "sequential"
Layer (type) Output Shape Param #
=================================================================
lstm (LSTM) (None, 12, 30) 4440
_________________________________________________________________
dense (Dense) (None, 12, 3) 93
=================================================================
Total params: 4,533
Trainable params: 4,533
Non-trainable params: 0
_________________________________________________________________
When i run this program, i get this error:
ValueError: Shapes (None, 3) and (None, 12, 3) are incompatible.
I already tried to reshape my data to a 72x1 vector, but this doesnt work either.
Maybe someone can help me how to shape my input data correctly :).
You probably need to define your model as follows as you used the categorical_crossentropy loss function.
model.add(LSTM(30, activation="softsign",
return_sequences=False, input_shape=(12, 6)))
model.add(Dense(3, activations='softmax'))

how many bidirectional lstm layers to use and how many is too many? Any advice on very imbalanced dataset?

Would much appreciate your help.
I am new to the RNN and I am trying to implement a RNN architecture to classify protein sequences. essentially they are one hot encoded np arrays.
I have an issue that the data is very imbalanced:
Examples:
Total: 34909
Positive: 282 (0.81% of total)
Therefore I am planning to implement the weights for the different classes by adding the class_weight=class_weight parameter when model is fitted.
I am also planning to use the f1 on the validation as a metric instead of accuracy or loss for the model as I am not interested in the true negatives.
Moreover, I am planning to implement transfer learning as I have dataseets with more positive data and datasets with only few points therefore I am planning to pretrain a general model and use the weights to further train on the specific problem.
I have come up with this architecture of the model however I am not sure if adding 4 bidirectional LSTM layers is a wise choice?:
from keras import regularizers
if output_bias is not None:
output_bias = Constant(output_bias)
model = Sequential()
# First LSTM layer
model.add(Bidirectional(LSTM(units=50, return_sequences=True, recurrent_dropout=0.1), input_shape=(timesteps, features)))
model.add(Dropout(0.5))
# Second LSTM layer
model.add(Bidirectional(LSTM(units=50, return_sequences=True)))
model.add(Dropout(0.5))
# Third LSTM layer
model.add(Bidirectional(LSTM(units=50, return_sequences=True)))
model.add(Dropout(0.5))
# Forth LSTM layer
model.add(Bidirectional(LSTM(units=50, return_sequences=False)))
model.add(Dropout(0.5))
#First Dense Layer
model.add(Dense(units=128,kernel_initializer='he_normal',activation='relu'))
model.add(Dropout(0.5))
# Adding the output layer
if output_bias == None:
model.add(Dense(units=1, activation='sigmoid',kernel_regularizer=regularizers.l2(0.001)))
else:
model.add(Dense(units=1, activation='sigmoid',
bias_initializer=output_bias,kernel_regularizer=regularizers.l2(0.001)))
model.compile(optimizer=Adam(lr=1e-3), loss=BinaryCrossentropy(), metrics=metrics)
model.build()
How do I know how many LSTM layers I should add? is it just trial and error?
Is there anything else I should include in the layers?
model.summary():
Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
bidirectional_13 (Bidirectio (None, 5, 100) 28400
_________________________________________________________________
dropout_16 (Dropout) (None, 5, 100) 0
_________________________________________________________________
bidirectional_14 (Bidirectio (None, 5, 100) 60400
_________________________________________________________________
dropout_17 (Dropout) (None, 5, 100) 0
_________________________________________________________________
bidirectional_15 (Bidirectio (None, 5, 100) 60400
_________________________________________________________________
dropout_18 (Dropout) (None, 5, 100) 0
_________________________________________________________________
bidirectional_16 (Bidirectio (None, 100) 60400
_________________________________________________________________
dropout_19 (Dropout) (None, 100) 0
_________________________________________________________________
dense_7 (Dense) (None, 128) 12928
_________________________________________________________________
dropout_20 (Dropout) (None, 128) 0
_________________________________________________________________
dense_8 (Dense) (None, 1) 129
=================================================================
Total params: 222,657
Trainable params: 222,657
Non-trainable params: 0
I have built this model by going through multiple tutorials such as https://www.tensorflow.org/tutorials/text/text_classification_rnn
and https://www.tensorflow.org/tutorials/structured_data/imbalanced_data
Would appreciate if you could point in the right direction.
Thanks!

How to stop over fitting a Model in Keras?

I am working on Text Classification Problem. My Model looks like this :
Model: "sequential_6"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_6 (Embedding) (None, 100, 50) 676050
_________________________________________________________________
lstm_6 (LSTM) (None, 16) 4288
_________________________________________________________________
dropout_1 (Dropout) (None, 16) 0
_________________________________________________________________
dense_6 (Dense) (None, 3) 51
=================================================================
Total params: 680,389
Trainable params: 680,389
Non-trainable params: 0
_________________________________________________________________
None
The dataset contains around 5300 No. of Sentences. I am using validation split=0.33.
The Model behaves in abnormal way. The validation loss keeps increasing and validation accuracy moves in constant way. I am attaching the graph.
Please guide me how to solve this issue.
My Model looks like this :
model=Sequential()
model.add(Embedding(
num_words,
EMBEDDING_DIM,
input_length=MAX_SEQUENCE_LENGTH
))
model.add(LSTM(32,return_sequences=True))
model.add(Dropout(0.5))
model.add(GlobalMaxPool1D())
model.add(Dense(len(possible_labels), activation="softmax"))
I am also attaching Accuracy Graph.
Increase dropout.
Train for fewer epochs.
Try Conv1D instead of LSTM to see if the overfitting goes away.

Expected dense_3_input to have shape (None, 40) but got array with shape (40, 1)

I am a beginner at Deep Learning and am attempting to practice the implementation of Neural Networks in Python by performing audio analysis on a dataset. I have been following the Urban Sound Challenge tutorial and have completed the code for training the model, but I keep running into errors when trying to run the model on the test set.
Here is my code for creation of the model and training:
import numpy as np
from sklearn.preprocessing import LabelEncoder
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
num_labels = y.shape[1]
filter_size = 2
model = Sequential()
model.add(Dense(256, input_shape = (40,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_labels))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')
model.fit(X, y, batch_size=32, epochs=40, validation_data=(val_X, val_Y))
Running model.summary() before fitting the model gives me:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_3 (Dense) (None, 256) 10496
_________________________________________________________________
activation_3 (Activation) (None, 256) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 256) 0
_________________________________________________________________
dense_4 (Dense) (None, 10) 2570
_________________________________________________________________
activation_4 (Activation) (None, 10) 0
=================================================================
Total params: 13,066
Trainable params: 13,066
Non-trainable params: 0
_________________________________________________________________
After fitting the model, I attempt to run it on one file so that it can classify the sound.
file_name = ".../UrbanSoundClassifier/test/Test/5.wav"
test_X, sample_rate = librosa.load(file_name,res_type='kaiser_fast')
mfccs = np.mean(librosa.feature.mfcc(y=test_X, sr=sample_rate, n_mfcc=40).T,axis=0)
test_X = np.array(mfccs)
print(model.predict(test_X))
However, I get
ValueError: Error when checking : expected dense_3_input to have shape
(None, 40) but got array with shape (40, 1)
Would someone kindly like to point me in the right direction as to how I should be testing the model? I do not know what the input for model.predict() should be.
Full code can be found here.
So:
The easiest fix to that is simply reshaping test_x:
test_x = test_x.reshape((1, 40))
More sophisticated is to reuse the pipeline you have for the creation of train and valid set also for a test set. Please, notice that the process you applied to data files is totally different in case of test. I'd create a test dataframe:
test_dataframe = pd.DataFrame({'filename': ["here path to test file"]}
and then reused existing pipeline for creation of validation set.

Categories