ValueError with Shapes using Bidirectional LSTM - python

I am trying to implement a Bidirectional LSTM for a sequence-to-sequence model. I have already one-hot-encoded my sequences with 12 total features. The input is 11 steps while the output is 23 steps. First, I coded this LSTM implementation that works with the first LSTM as the encoder and the second as the decoder.
model = Sequential()
model.add(LSTM(75, input_shape=(11, 12)))
model.add(RepeatVector(23))
model.add(LSTM(50, return_sequences=True))
model.add(TimeDistributed(Dense(12, activation='softmax')))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
X, y = generate_data(1, taskset, trainset)
model.fit(X, y, epochs=1, batch_size=32, verbose=1)
I then tried to turn this into a bidirectional LSTM as follows:
model = Sequential()
model.add(Bidirectional(LSTM(75, return_sequences=True), input_shape=(11,12), merge_mode='concat'))
model.add(Bidirectional(LSTM(50, return_sequences=True)))
model.add(TimeDistributed(Dense(12, activation='softmax')))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics= ['accuracy'])
model.summary()
X, y = generate_data(1, taskset, trainset)
model.fit(X, y, epochs=1, batch_size=32, verbose=1)
The goal is to use the first bidirectional LSTM as the encoder and the second bidirectional LSTM as the decoder. I removed the RepeatVector in the bidirectional implementation because it gave me a dimension error (needed dim=2, received dim=3). With the current bidirectional LSTM I am getting this error:
ValueError: Shapes (None, 23, 12) and (None, 11, 12) are incompatible
Any help with fixing the bidirectional LSTM implementation?

Simply setting return_sequences=False in your first bidirectional LSTM and adding as before RepeatVector(23) works fine
n_sample = 10
X = np.random.uniform(0,1, (n_sample, 11, 12))
y = np.random.randint(0,2, (n_sample, 23, 12))
model = Sequential()
model.add(Bidirectional(LSTM(75), input_shape=(11,12), merge_mode='concat'))
model.add(RepeatVector(23))
model.add(Bidirectional(LSTM(50, return_sequences=True)))
model.add(TimeDistributed(Dense(12, activation='softmax')))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics= ['accuracy'])
model.fit(X, y, epochs=3, batch_size=32, verbose=1)

Related

ValueError in model.fit in lstm

I am trying to fit an lstm model to my data read as csv file. (320,6) is the shape of x_train, and the model is given as
def build_modelLSTMlite(input_shape):
model = keras.Sequential()
model.add(keras.layers.LSTM(64, input_shape=input_shape))
model.add(keras.layers.Dense(64, activation='relu'))
model.add(keras.layers.Dropout(0.3))
model.add(keras.layers.Dense(10, activation='softmax'))
return model
model = build_modelLSTMlite(input_shape)
optimiser = keras.optimizers.Adam(learning_rate=0.001)
model.compile(optimizer=optimiser,
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.summary()
history = model.fit(x_train, y_train, batch_size=32, epochs=100)
This model.fit() is showing value error
ValueError: Input 0 of layer "sequential_1" is incompatible with the layer: expected shape=(None, 320, 6), found shape=(32, 6)
You have to create a sliding version of your x_train data. Something like:
from numpy.lib.stride_tricks import sliding_window_view
x_train_lstm = sliding_window_view(x_train, (input_shape[0], x_train.shape[1])).squeeze(axis=1)
history = model.fit(X_train_lstm, y_train[:-input_shape[0]+1], batch_size=32, epochs=100)

Why is my multiclass neural model not training (accuracy and loss staying same)?

I am learning neural networks. I get 98% accuracy with classical ML methods, so I think I made a coding error. The neural networks model is not learning.
Things I tried:
Changing X and y to float64 or float32
Normalizing data
Changing the activation to "linear" or "relu"
Removing Flatten()
Adding hidden layers
Using stochastic gradient descent as optimizer, instead of "adam".
Changing the y label with another label
There are 9 labels in X_train and 8 different classes in y_train.
X_train:
y_train:
Code:
model = keras.models.Sequential()
model.add(keras.layers.Input(shape=(9,)))
model.add(keras.layers.Dense(8, activation='softmax'))
model.add(layers.Flatten())
model.compile(optimizer= 'adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
Fitting:
I tried these lines by changing the target label. None of them help training the model. Some give "nan" loss, some go slightly up and down, but all of them are below 0.1% accuracy:
model = tf.keras.Sequential()
model.add(layers.Input(shape=(9,)))
model.add(layers.Dense(1, name='dense1'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=20, batch_size=24)
or this:
model = tf.keras.Sequential()
model.add(layers.Input(shape=(9,)))
model.add(layers.Dense(3, activation='relu', name='relu1'))
model.add(layers.Dense(16, activation='relu', name='relu2'))
model.add(layers.Dense(16, activation='relu', name='relu3'))
model.add(layers.Dense(1, name='dense1'))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
history = model.fit(x=X_train, y=y_train, epochs=20)

Audio processing Conv1D keras

I am learning Keras using audio classification, Actually, I am implementing the code with modification from https://github.com/deepsound-project/genre-recognition/blob/master/train_model.py using Keras.
The shape of the dataset is
X_train shape = (800, 32, 1)
y_train shape = (800, 10)
X_test shape = (200, 32, 1)
y_test shape = (200, 10)
The model
model = Sequential()
model.add(Conv1D(filters=256, kernel_size=5, input_shape=(32,1), activation="relu"))
model.add(BatchNormalization(momentum=0.9))
model.add(MaxPooling1D(2))
model.add(Dropout(0.5))
model.add(Conv1D(filters=256, kernel_size=5, activation="relu"))
model.add(BatchNormalization(momentum=0.9))
model.add(MaxPooling1D(2))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(128, activation="relu", ))
model.add(Dense(10, activation='softmax'))
model.compile(
loss='categorical_crossentropy',
optimizer = Adam(lr=0.001),
metrics = ['accuracy'],
)
model.summary()
red_lr= ReduceLROnPlateau(monitor='val_loss',patience=2,verbose=2,factor=0.5,min_delta=0.01)
check=ModelCheckpoint(filepath=r'/content/drive/My Drive/Colab Notebooks/gen/cnn.hdf5', verbose=1, save_best_only = True)
History = model.fit(X_train,
y_train,
epochs=100,
#batch_size=512,
validation_data = (X_test, y_test),
verbose = 2,
callbacks=[check, red_lr],
shuffle=True )
The accuracy graph
Loss graph
I do not understand, Why the val_acc is in the range of 70%. I tried to modify the model architecture including optimizer, but no improvement.
And, Is it good to have a lot of difference between loss and val_loss.
how to improve the accuracy above 80... any help...
Thank you
I found it, I use concatenate function from Keras to concatenate all convolution layers and, it gives the best performance.

Loading saved model (Bidirectional LSTM) in Keras

I trained and saved a Bidirectional LSTM model in Keras successfully with:
model = Sequential()
model.add(Bidirectional(LSTM(N_HIDDEN_NEURONS,
return_sequences=True,
activation="tanh",
input_shape=(SEGMENT_TIME_SIZE, N_FEATURES))))
model.add(Bidirectional(LSTM(N_HIDDEN_NEURONS)))
model.add(Dropout(0.5))
model.add(Dense(N_CLASSES, activation='sigmoid'))
model.compile('adam', 'binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train,
batch_size=BATCH_SIZE,
epochs=N_EPOCHS,
validation_data=[X_test, y_test])
model.save('model_keras/model.h5')
However, when I want to load it with:
model = load_model('model_keras/model.h5')
I get an error:
ValueError: You are trying to load a weight file containing 3 layers
into a model with 0 layers.
I also tried different methods like saving and loading model architecture and weights separately but none of them worked for me. Also, previously, when I was using normal (unidirectional) LSTMs, loading the model worked fine.
As mentioned by #mpariente and #today, the input_shape is an argument of Bidirectional, not LSTM, see Keras documentation. My solution:
# Model
model = Sequential()
model.add(Bidirectional(LSTM(N_HIDDEN_NEURONS,
return_sequences=True,
activation="tanh"),
input_shape=(SEGMENT_TIME_SIZE, N_FEATURES)))
model.add(Bidirectional(LSTM(N_HIDDEN_NEURONS)))
model.add(Dropout(0.5))
model.add(Dense(N_CLASSES, activation='sigmoid'))
model.compile('adam', 'binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train,
batch_size=BATCH_SIZE,
epochs=N_EPOCHS,
validation_data=[X_test, y_test])
model.save('model_keras/model.h5')
and then, to load, simply do:
model = load_model('model_keras/model.h5')

Why does a binary Keras CNN always predict 1?

I want to build a binary classifier using a Keras CNN.
I have about 6000 rows of input data which looks like this:
>> print(X_train[0])
[[[-1.06405307 -1.06685851 -1.05989663 -1.06273152]
[-1.06295958 -1.06655996 -1.05969803 -1.06382503]
[-1.06415248 -1.06735609 -1.05999593 -1.06302975]
[-1.06295958 -1.06755513 -1.05949944 -1.06362621]
[-1.06355603 -1.06636092 -1.05959873 -1.06173742]
[-1.0619655 -1.06655996 -1.06039312 -1.06412326]
[-1.06415248 -1.06725658 -1.05940014 -1.06322857]
[-1.06345662 -1.06377347 -1.05890365 -1.06034568]
[-1.06027557 -1.06019084 -1.05592469 -1.05537518]
[-1.05550398 -1.06038988 -1.05225064 -1.05676692]]]
>>> print(y_train[0])
[1]
And then I've build a CNN by this way:
model = Sequential()
model.add(Convolution1D(input_shape = (10, 4),
nb_filter=16,
filter_length=4,
border_mode='same'))
model.add(BatchNormalization())
model.add(LeakyReLU())
model.add(Dropout(0.2))
model.add(Convolution1D(nb_filter=8,
filter_length=4,
border_mode='same'))
model.add(BatchNormalization())
model.add(LeakyReLU())
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(64))
model.add(BatchNormalization())
model.add(LeakyReLU())
model.add(Dense(1))
model.add(Activation('softmax'))
reduce_lr = ReduceLROnPlateau(monitor='val_acc', factor=0.9, patience=30, min_lr=0.000001, verbose=0)
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
history = model.fit(X_train, y_train,
nb_epoch = 100,
batch_size = 128,
verbose=0,
validation_data=(X_test, y_test),
callbacks=[reduce_lr],
shuffle=True)
y_pred = model.predict(X_test)
But it returns the following:
>> print(confusion_matrix(y_test, y_pred))
[[ 0 362]
[ 0 608]]
Why all predictions are ones? Why does the CNN perform so bad?
Here are the loss and acc charts:
It always predicts one because of the output in your network. You have a Dense layer with one neuron, with a Softmax activation. Softmax normalizes by the sum of exponential of each output. Since there is one output, the only possible output is 1.0.
For a binary classifier you can either use a sigmoid activation with the "binary_crossentropy" loss, or put two output units at the last layer, keep using softmax and change the loss to categorical_crossentropy.

Categories