Why LSTM predicts the same value for all test cases? - python

I have a regression task and time series data. For each observation I need to predict one outcome value. My data is a series of images. I have hand-crafted 32 features from my images. Images have 10 channels. My data has 4D shape: (observations, time steps, channels, features), e.g. (3348, 121, 10, 32). After normalisation one channel for one observation looks like this:
matplotlib.pyplot.matshow(normalized[170,:,0,:].transpose())
The figure shows 121 time steps (x-axis) and each time step has features on rows (32). The intensity of feature value is shown in colors. So there seems to be something happening in time.
Question 1: How to apply RNN in such a task?
Could I somehow use CNN to extract information from my 3rd and 4th axis of my data (as shown in figure above)?
A proposed solution (and troubles ahead):
I have flattened the data into 3D. I think I have sadly degenerated information for the learner, but at least this (almost) works:
m,n = xtrain4d.shape[:2]
xtrain3d = xtrain4d.reshape(m,n,-1)
Shape is now: (3348, 121, 320). Target ytrain.shape is (3348,).
Here's my LSTM:
def LSTMrecurrentNN(shape1, shape2):
model = Sequential()
model.add(LSTM(128, return_sequences=True, input_shape=(shape1, shape2)))
model.add(Dropout(0.5))
model.add(Dense(64, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(16, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='linear'))
return model
model = LSTMrecurrentNN(xtrain3d.shape[1], xtrain3d.shape[2])
model.summary()
Model: "sequential_12"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_11 (LSTM) (None, 121, 128) 229888
_________________________________________________________________
dropout_18 (Dropout) (None, 121, 128) 0
_________________________________________________________________
dense_21 (Dense) (None, 121, 64) 8256
_________________________________________________________________
dropout_19 (Dropout) (None, 121, 64) 0
_________________________________________________________________
dense_22 (Dense) (None, 121, 16) 1040
_________________________________________________________________
dropout_20 (Dropout) (None, 121, 16) 0
_________________________________________________________________
dense_23 (Dense) (None, 121, 1) 17
=================================================================
Total params: 239,201
Trainable params: 239,201
Non-trainable params: 0
_________________________________________________________________
Running the model:
epochs = 20
batchsize=128
learningrate=0.001
epsilon=0.1
# monitor validation progress:
early = EarlyStopping(monitor = "val_loss", mode = "min", patience = 10)
callbacks_list = [early]
# compile:
model.compile(loss = 'mean_squared_error',
optimizer = Adam(learning_rate=learningrate, epsilon = epsilon),
metrics = ['mse'])
# and train the model
history = model.fit(xtrain3d, ytrain,
epochs=epochs, batch_size=batchsize, verbose=0,
validation_split = 0.20,
callbacks = callbacks_list)
# predict:
test_predictions = model.predict(Xtest)
Training and validation performance looks ok:
But on the test set the model predicts one value for all observations! The figure below shows how for 10 observations in the test set the model predicts already in the early time steps a value of 3267 that is close to the mean of target y.
Statistics tell the same:
scipy.stats.describe(test_predictions[:,-1,0])
DescribeResult(nobs=1544, minmax=(3267.813, 3267.813),
mean=3267.8127, variance=5.964328e-08, skewness=1.0, kurtosis=-2.0)
For target y:
scipy.stats.describe(ytest)
DescribeResult(nobs=1544, minmax=(0.0, 8000.0),
mean=3312.1081606217617,
variance=1381985.8476585718, skewness=0.2847730511366937, kurtosis=0.20894280037919222)
Question 2: Why the model predicts the same value for all?
Any hints, how to check LSTM behaviour (states)? I would like to know how far back it "remembers".

Related

InvalidArgumentError: indices[120,2] = -1 is not in [0, 10) in Keras

I am new to Keras and I am trying to train an LSTM network with the following parameters, however, I get the following error
InvalidArgumentError: indices[120,2] = -1 is not in [0, 10)
[[node sequential_3/embedding_3/embedding_lookup (defined at <ipython-input-65-50ea16cb11fb>:5) ]] [Op:__inference_train_function_13886]
Errors may have originated from an input operation.
Input Source operations connected to node sequential_3/embedding_3/embedding_lookup:
sequential_3/embedding_3/embedding_lookup/12643 (defined at /home/jpandeinge/anaconda3/lib/python3.7/contextlib.py:112)
Function call stack:
train_function
Here is a snippet of my code;
# The next step is to split training and testing data. For this we will use sklearn function train_test_split().
features_train, features_test, labels_train, labels_test = train_test_split(features, labels, test_size=0.2)
# features and labels shape
features_train.shape, features_test.shape, labels_train.shape, labels_test.shape
((180568, 82), (45143, 82), (180568,), (45143,))
model = Sequential()
model.add(Embedding(10, 82, input_length=180568))
model.add(LSTM(10, return_sequences=True, input_shape=features_train))
model.add(Activation('sigmoid'))
model.add(Dropout(0.2))
model.build()
model.compile(loss = 'binary_crossentropy', optimizer='adam', metrics = ['accuracy'])
model.summary()
Model: "sequential_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_3 (Embedding) (None, 180568, 82) 820
_________________________________________________________________
lstm_3 (LSTM) (None, 180568, 10) 3720
_________________________________________________________________
activation_3 (Activation) (None, 180568, 10) 0
_________________________________________________________________
dropout_3 (Dropout) (None, 180568, 10) 0
=================================================================
Total params: 4,540
Trainable params: 4,540
Non-trainable params: 0
________________________
history = model.fit(features_train,
labels_train,
epochs=10,
batch_size=128)
features_train should comprise of indices from 0 to 9 (it is expected by your model. But features_train[120, 2] equals to -1.

how many bidirectional lstm layers to use and how many is too many? Any advice on very imbalanced dataset?

Would much appreciate your help.
I am new to the RNN and I am trying to implement a RNN architecture to classify protein sequences. essentially they are one hot encoded np arrays.
I have an issue that the data is very imbalanced:
Examples:
Total: 34909
Positive: 282 (0.81% of total)
Therefore I am planning to implement the weights for the different classes by adding the class_weight=class_weight parameter when model is fitted.
I am also planning to use the f1 on the validation as a metric instead of accuracy or loss for the model as I am not interested in the true negatives.
Moreover, I am planning to implement transfer learning as I have dataseets with more positive data and datasets with only few points therefore I am planning to pretrain a general model and use the weights to further train on the specific problem.
I have come up with this architecture of the model however I am not sure if adding 4 bidirectional LSTM layers is a wise choice?:
from keras import regularizers
if output_bias is not None:
output_bias = Constant(output_bias)
model = Sequential()
# First LSTM layer
model.add(Bidirectional(LSTM(units=50, return_sequences=True, recurrent_dropout=0.1), input_shape=(timesteps, features)))
model.add(Dropout(0.5))
# Second LSTM layer
model.add(Bidirectional(LSTM(units=50, return_sequences=True)))
model.add(Dropout(0.5))
# Third LSTM layer
model.add(Bidirectional(LSTM(units=50, return_sequences=True)))
model.add(Dropout(0.5))
# Forth LSTM layer
model.add(Bidirectional(LSTM(units=50, return_sequences=False)))
model.add(Dropout(0.5))
#First Dense Layer
model.add(Dense(units=128,kernel_initializer='he_normal',activation='relu'))
model.add(Dropout(0.5))
# Adding the output layer
if output_bias == None:
model.add(Dense(units=1, activation='sigmoid',kernel_regularizer=regularizers.l2(0.001)))
else:
model.add(Dense(units=1, activation='sigmoid',
bias_initializer=output_bias,kernel_regularizer=regularizers.l2(0.001)))
model.compile(optimizer=Adam(lr=1e-3), loss=BinaryCrossentropy(), metrics=metrics)
model.build()
How do I know how many LSTM layers I should add? is it just trial and error?
Is there anything else I should include in the layers?
model.summary():
Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
bidirectional_13 (Bidirectio (None, 5, 100) 28400
_________________________________________________________________
dropout_16 (Dropout) (None, 5, 100) 0
_________________________________________________________________
bidirectional_14 (Bidirectio (None, 5, 100) 60400
_________________________________________________________________
dropout_17 (Dropout) (None, 5, 100) 0
_________________________________________________________________
bidirectional_15 (Bidirectio (None, 5, 100) 60400
_________________________________________________________________
dropout_18 (Dropout) (None, 5, 100) 0
_________________________________________________________________
bidirectional_16 (Bidirectio (None, 100) 60400
_________________________________________________________________
dropout_19 (Dropout) (None, 100) 0
_________________________________________________________________
dense_7 (Dense) (None, 128) 12928
_________________________________________________________________
dropout_20 (Dropout) (None, 128) 0
_________________________________________________________________
dense_8 (Dense) (None, 1) 129
=================================================================
Total params: 222,657
Trainable params: 222,657
Non-trainable params: 0
I have built this model by going through multiple tutorials such as https://www.tensorflow.org/tutorials/text/text_classification_rnn
and https://www.tensorflow.org/tutorials/structured_data/imbalanced_data
Would appreciate if you could point in the right direction.
Thanks!

Keras: How to connect a CNN model with a decision tree

I want to train a model to predict one's emotion from the physical signals. I have a physical signal and using it as input feature;
ecg(Electrocardiography)
I want to use the CNN architecture to extract features from the data, and then use these extracted features to feed a classical "Decision Tree Classifier". Below, you can see my CNN aproach without the decision tree;
model = Sequential()
model.add(Conv1D(15,60,padding='valid', activation='relu',input_shape=(18000,1), strides = 1, kernel_regularizer=regularizers.l1_l2(l1=0.1, l2=0.1)))
model.add(MaxPooling1D(2,data_format='channels_last'))
model.add(Dropout(0.6))
model.add(BatchNormalization())
model.add(Conv1D(30, 60, padding='valid', activation='relu',kernel_regularizer = regularizers.l1_l2(l1=0.1, l2=0.1), strides=1))
model.add(MaxPooling1D(4,data_format='channels_last'))
model.add(Dropout(0.6))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dense(3, activation = 'softmax'))
I want to edit this code so that, in the output layer there will be working decision tree instead of model.add(Dense(3, activation = 'softmax')). I have tried to save the outputs of the last convolutional layer like this;
output = model.layers[-6].output
And when I printed out the output variable, result was this;
THE OUTPUT: Tensor("conv1d_56/Relu:0", shape=(?, 8971, 30),
dtype=float32)
I guess, the output variable holds the extracted features. Now, how can I feed my decision tree classifier model with this data which is stored in the output variable? Here is the decision tree from scikit learn;
from sklearn.tree import DecisionTreeClassifier
dtc = DecisionTreeClassifier(criterion = 'entropy')
dtc.fit()
How should I feed the fit() method? Thanks in advance.
To extract a vector of features that you can pass on to another algorithm, you need a fully connected layer before your softmax layer. Something like this will add in a 128 dimensional layer just before your softmax layer:
model = Sequential()
model.add(Conv1D(15,60,padding='valid', activation='relu',input_shape=(18000,1), strides = 1, kernel_regularizer=regularizers.l1_l2(l1=0.1, l2=0.1)))
model.add(MaxPooling1D(2,data_format='channels_last'))
model.add(Dropout(0.6))
model.add(BatchNormalization())
model.add(Conv1D(30, 60, padding='valid', activation='relu',kernel_regularizer = regularizers.l1_l2(l1=0.1, l2=0.1), strides=1))
model.add(MaxPooling1D(4,data_format='channels_last'))
model.add(Dropout(0.6))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(3, activation = 'softmax'))
If you then run model.summary() you can see the name of the layers:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_9 (Conv1D) (None, 17941, 15) 915
_________________________________________________________________
max_pooling1d_9 (MaxPooling1 (None, 8970, 15) 0
_________________________________________________________________
dropout_10 (Dropout) (None, 8970, 15) 0
_________________________________________________________________
batch_normalization_9 (Batch (None, 8970, 15) 60
_________________________________________________________________
conv1d_10 (Conv1D) (None, 8911, 30) 27030
_________________________________________________________________
max_pooling1d_10 (MaxPooling (None, 2227, 30) 0
_________________________________________________________________
dropout_11 (Dropout) (None, 2227, 30) 0
_________________________________________________________________
batch_normalization_10 (Batc (None, 2227, 30) 120
_________________________________________________________________
flatten_6 (Flatten) (None, 66810) 0
_________________________________________________________________
dense_7 (Dense) (None, 128) 8551808
_________________________________________________________________
dropout_12 (Dropout) (None, 128) 0
_________________________________________________________________
dense_8 (Dense) (None, 3) 387
=================================================================
Total params: 8,580,320
Trainable params: 8,580,230
Non-trainable params: 90
_________________________________________________________________
Once your network has been trained you can create a new model where the output layer becomes 'dense_7' and it'll generate 128 dimensional feature vectors:
feature_vectors_model = Model(model.input, model.get_layer('dense_7').output)
dtc_features = feature_vectors_model.predict(your_X_data) # fit your decision tree on this data

Expected dense_3_input to have shape (None, 40) but got array with shape (40, 1)

I am a beginner at Deep Learning and am attempting to practice the implementation of Neural Networks in Python by performing audio analysis on a dataset. I have been following the Urban Sound Challenge tutorial and have completed the code for training the model, but I keep running into errors when trying to run the model on the test set.
Here is my code for creation of the model and training:
import numpy as np
from sklearn.preprocessing import LabelEncoder
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
num_labels = y.shape[1]
filter_size = 2
model = Sequential()
model.add(Dense(256, input_shape = (40,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_labels))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')
model.fit(X, y, batch_size=32, epochs=40, validation_data=(val_X, val_Y))
Running model.summary() before fitting the model gives me:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_3 (Dense) (None, 256) 10496
_________________________________________________________________
activation_3 (Activation) (None, 256) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 256) 0
_________________________________________________________________
dense_4 (Dense) (None, 10) 2570
_________________________________________________________________
activation_4 (Activation) (None, 10) 0
=================================================================
Total params: 13,066
Trainable params: 13,066
Non-trainable params: 0
_________________________________________________________________
After fitting the model, I attempt to run it on one file so that it can classify the sound.
file_name = ".../UrbanSoundClassifier/test/Test/5.wav"
test_X, sample_rate = librosa.load(file_name,res_type='kaiser_fast')
mfccs = np.mean(librosa.feature.mfcc(y=test_X, sr=sample_rate, n_mfcc=40).T,axis=0)
test_X = np.array(mfccs)
print(model.predict(test_X))
However, I get
ValueError: Error when checking : expected dense_3_input to have shape
(None, 40) but got array with shape (40, 1)
Would someone kindly like to point me in the right direction as to how I should be testing the model? I do not know what the input for model.predict() should be.
Full code can be found here.
So:
The easiest fix to that is simply reshaping test_x:
test_x = test_x.reshape((1, 40))
More sophisticated is to reuse the pipeline you have for the creation of train and valid set also for a test set. Please, notice that the process you applied to data files is totally different in case of test. I'd create a test dataframe:
test_dataframe = pd.DataFrame({'filename': ["here path to test file"]}
and then reused existing pipeline for creation of validation set.

Keras, why is adding layers to a model so slow?

I am trying to build a really large model in Keras with 3 LSTM layers with 4096 hidden units each. Previously I had 1024 hidden units in each layer. The compile time for this network was reasonable. Each layer would add in about 1 to 2 seconds. Now that the model has 4096 hidden units per layer the add time for each layer is about 5 minutes. What I think is strange though is that the slow performance happens during the three calls to model.add(LSTM...) and not during model.compile(...). I need to use a larger network but this wait time is a little unbearable. It is not so bad for the training since that will take much longer but I don't want to sit through it every time I want to generate test output. Why does the add take so much time? Isn't add just defining the layer and all the time should be spent in the compile function? Also is there anything I can do about it?
print('Building Model')
model = Sequential()
model.add(LSTM(lstm_size, batch_input_shape = (batch_size, 1, len(bytes_set)), stateful = True, return_sequences = True, consume_less = consume_less))
model.add(Dropout(0.5))
print('Added LSTM_1')
model.add(LSTM(lstm_size, stateful = True, return_sequences = True, consume_less = consume_less))
model.add(Dropout(0.5))
print('Added LSTM_2')
model.add(LSTM(lstm_size, stateful = True, return_sequences = False, consume_less = consume_less))
model.add(Dropout(0.5))
print('Added LSTM_3')
model.add(Dense(len(bytes_set), activation = 'softmax'))
print('Compiling Model')
model.compile(optimizer = SGD(lr = 0.3, momentum = 0.9, decay = 1e-5, nesterov = True),
loss = 'categorical_crossentropy',
metrics = ['accuracy'])
Here is my .theanorc
[global]
floatX = float32
mode = FAST_RUN
device = gpu
exception_verbosity = high
[nvcc]
fastmath = 1
Here are is my model summary as requested. Unfortunately I have been running this new version for the past few hours so I dont want to make any new changes. This model has 4 LSTM layers of size 1500 each.
Layer (type) Output Shape Param # Connected to
====================================================================================================
lstm_1 (LSTM) (64, 1, 1500) 9774000 lstm_input_1[0][0]
____________________________________________________________________________________________________
dropout_1 (Dropout) (64, 1, 1500) 0 lstm_1[0][0]
____________________________________________________________________________________________________
lstm_2 (LSTM) (64, 1, 1500) 18006000 dropout_1[0][0]
____________________________________________________________________________________________________
dropout_2 (Dropout) (64, 1, 1500) 0 lstm_2[0][0]
____________________________________________________________________________________________________
lstm_3 (LSTM) (64, 1, 1500) 18006000 dropout_2[0][0]
____________________________________________________________________________________________________
dropout_3 (Dropout) (64, 1, 1500) 0 lstm_3[0][0]
____________________________________________________________________________________________________
lstm_4 (LSTM) (64, 1500) 18006000 dropout_3[0][0]
____________________________________________________________________________________________________
dropout_4 (Dropout) (64, 1500) 0 lstm_4[0][0]
____________________________________________________________________________________________________
dense_1 (Dense) (64, 128) 192128 dropout_4[0][0]
====================================================================================================
Total params: 63984128
____________________________________________________________________________________________________
It's slow because you are trying to allocate a matrix which needs at least 0.5GB memory. 4096 units * 4097 weights is already a huge number. LSTM has additional inner weights associated with input, output and forgetting gates. As you can see this sums up to a huge number.
UPDATE
I wrote my answer from my mobile and I wrote TB instead of GB. You can easily check the size of your model by adding:
print model.summary()
in both cases (1024 and 4096). Please share your results in a comment because I'm interested :)

Categories