I am trying to make a multiclass classifier in Keras, but I am getting a dimension mismatch in the Dense layer.
MAX_SENT_LENGTH = 100
MAX_SENTS = 15
EMBEDDING_DIM = 100
x_train = data[:-nb_validation_samples]
y_train = labels[:-nb_validation_samples]
x_val = data[-nb_validation_samples:]
y_val = labels[-nb_validation_samples:]
embedding_layer = Embedding(len(word_index) + 1,
EMBEDDING_DIM,
weights=[embedding_matrix],
input_length=MAX_SENT_LENGTH,
trainable=True)
sentence_input = Input(shape=(MAX_SENT_LENGTH,), dtype='int32')
embedded_sequences = embedding_layer(sentence_input)
l_lstm = Bidirectional(LSTM(100))(embedded_sequences)
sentEncoder = Model(sentence_input, l_lstm)
review_input = Input(shape=(MAX_SENTS,MAX_SENT_LENGTH), dtype='int32')
review_encoder = TimeDistributed(sentEncoder)(review_input)
l_lstm_sent = Bidirectional(LSTM(100))(review_encoder)
preds = Dense(7, activation='softmax')(l_lstm_sent)
model = Model(review_input, preds)
model.compile(loss='sparse_categorical_crossentropy',
optimizer='rmsprop',
metrics=['acc'])
model.fit(x_train, y_train, validation_data=(x_val, y_val),
epochs=10, batch_size=50)
The class labels are transformed into a 1-hot vector correctly, but when trying to fit the model, I am getting this mismatch error:
('Shape of data tensor:', (5327, 15, 100))
('Shape of label tensor:', (5327, 7))
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) (None, 15, 100) 0
_________________________________________________________________
time_distributed_1 (TimeDist (None, 15, 200) 351500
_________________________________________________________________
bidirectional_2 (Bidirection (None, 200) 240800
_________________________________________________________________
dense_1 (Dense) (None, 7) 1407
=================================================================
Total params: 592,501
Trainable params: 592,501
Non-trainable params: 0
_________________________________________________________________
None
ValueError: Error when checking target: expected dense_1 to have
shape (None, 1) but got array with shape (4262, 7)
Where does this (None, 1) dimension come from and how can I solve this error?
You should use loss='categorical_crossentropy' instead of loss='sparse_categorical_crossentropy' if your label is one-hot encoded. 'sparse_categorical_crossentropy' takes integer labels, and that's why (None,1) dimension is required.
Related
i'm fairly new to tensorflow and would appreciate answers a lot.
i'm trying to use a transformer model as an embedding layer and feed the data to a custom model.
from transformers import TFAutoModel
from tensorflow.keras import layers
def build_model():
transformer_model = TFAutoModel.from_pretrained(MODEL_NAME, config=config)
input_ids_in = layers.Input(shape=(MAX_LEN,), name='input_ids', dtype='int32')
input_masks_in = layers.Input(shape=(MAX_LEN,), name='attention_mask', dtype='int32')
embedding_layer = transformer_model(input_ids_in, attention_mask=input_masks_in)[0]
X = layers.Bidirectional(tf.keras.layers.LSTM(50, return_sequences=True, dropout=0.1, recurrent_dropout=0.1))(embedding_layer)
X = layers.GlobalMaxPool1D()(X)
X = layers.Dense(64, activation='relu')(X)
X = layers.Dropout(0.2)(X)
X = layers.Dense(30, activation='softmax')(X)
model = tf.keras.Model(inputs=[input_ids_in, input_masks_in], outputs = X)
for layer in model.layers[:3]:
layer.trainable = False
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
return model
model = build_model()
model.summary()
r = model.fit(
train_ds,
steps_per_epoch=train_steps,
epochs=EPOCHS,
verbose=3)
I have 30 classes and the labels are not one-hot encoded so im using sparse_categorical_crossentropy as my loss function but i keep getting the following error
ValueError: Shape mismatch: The shape of labels (received (1,)) should equal the shape of logits except for the last dimension (received (10, 30)).
how can i solve this?
and why is the (10, 30) shape required? i know 30 is because of the last Dense layer with 30 units but why the 10? is it because of the MAX_LENGTH which is 10?
my model summary:
Model: "model_16"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_ids (InputLayer) [(None, 10)] 0
__________________________________________________________________________________________________
attention_mask (InputLayer) [(None, 10)] 0
__________________________________________________________________________________________________
tf_bert_model_21 (TFBertModel) TFBaseModelOutputWit 162841344 input_ids[0][0]
attention_mask[0][0]
__________________________________________________________________________________________________
bidirectional_17 (Bidirectional (None, 10, 100) 327600 tf_bert_model_21[0][0]
__________________________________________________________________________________________________
global_max_pooling1d_15 (Global (None, 100) 0 bidirectional_17[0][0]
__________________________________________________________________________________________________
dense_32 (Dense) (None, 64) 6464 global_max_pooling1d_15[0][0]
__________________________________________________________________________________________________
dropout_867 (Dropout) (None, 64) 0 dense_32[0][0]
__________________________________________________________________________________________________
dense_33 (Dense) (None, 30) 1950 dropout_867[0][0]
==================================================================================================
Total params: 163,177,358
Trainable params: 336,014
Non-trainable params: 162,841,344
10 is a number of sequences in one batch. I suspect that it is a number of sequences in your dataset.
Your model acting as a sequence classifier. So you should have one label for every sequence.
I have trained a model for CNN and I am getting the error on the Embedding layer and I use my pre trained embedding to classification
my code
embed_size = 300
max_features = 100000
maxlen = 254
num_filters = 100
nb_words = min(max_features , len(word_index)) + 1
model = Sequential()
model.add(Embedding(nb_words, embed_size, trainable=False))
model.add(Conv1D(num_filters, 7, activation='relu', padding='same'))
model.add(MaxPooling1D(2))
model.add(Conv1D(num_filters, 7, activation='relu', padding='same'))
Model Summary
Model: "sequential_47"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_38 (Embedding) (None, None, 300) 537900
_________________________________________________________________
conv1d_39 (Conv1D) (None, None, 100) 210100
_________________________________________________________________
max_pooling1d_19 (MaxPooling (None, None, 100) 0
_________________________________________________________________
conv1d_40 (Conv1D) (None, None, 100) 70100
=================================================================
Total params: 818,100
Trainable params: 280,200
Non-trainable params: 537,900
_________________________________________________________________
I tried to train model
early_stopping = EarlyStopping(monitor='val_loss', min_delta=0.01, patience=4, verbose=1)
callbacks_list = [early_stopping]
model.fit(train_X, y_train, batch_size=1000, epochs=10, verbose=1,
callbacks=callbacks_list)
ValueError: Error when checking target: expected embedding_37 to have 3 dimensions, but got array with shape (90, 1)
train_X.shape
(90, 352)
Any help will be greatly appreciated.
Thank you
LSTM model on non-text data is trained to classify two -classes.
I have 225 time points for each product (N=730), with 167 features including the target. Only the last time point is to be predicted.
I use the target as a feature in predictions: here is how I prepare the input:
def split_sequences(sequences, n_steps, n_steps_out):
X, y = list(), list()
for i in range(n_steps_out):
# gather input and output parts of the pattern
y.append(sequences[n_steps + i:n_steps + i + 1, -1][0])
#targ = sequences[n_steps + i:n_steps + i + 1, -1][0]
#y.append(int(targ)) if ((targ==0) | (targ==1)) else y.append(2)
X.append(sequences[:n_steps, :])
return np.asarray(X).reshape(n_steps, sequences.shape[1]), np.asarray(y).reshape(n_steps_out)
#del X_train_minmax, X_test_minmax
min_max_scaler = preprocessing.MinMaxScaler(feature_range=(0, 1))
#X_train_minmax = min_max_scaler.fit_transform(X_train.iloc[:, 0:166])
#X_test_minmax = min_max_scaler.fit_transform(X_test.iloc[:, 0:166])
X_train_minmax = min_max_scaler.fit_transform(X_train) ##all features included
X_test_minmax = min_max_scaler.fit_transform(X_test)
print(X_train_minmax.shape)
print(X_test_minmax.shape)
seq_samples = 631
seq_samples2 = 99
time_steps = 225
periods_to_predict = 1
periods_to_train = time_steps - periods_to_predict ##here may be a problem
#
features = 167
X_train_reshaped = X_train_minmax.reshape(seq_samples,time_steps,features)
X_test_reshaped = X_test_minmax.reshape(seq_samples2, time_steps,features)
data_train = [split_sequences(x, periods_to_train , periods_to_predict) for x in X_train_reshaped] ##and here i shoud check the function
data_test = [split_sequences(x, periods_to_train , periods_to_predict) for x in X_test_reshaped]
X_train, y_train, X_test, y_test = [], [], [], []
for x in data_train:
X_train.append(x[0])
y_train.append(x[1])
for x in data_test:
X_test.append(x[0])
y_test.append(x[1])
X_train = np.asarray(X_train)
y_train = np.asarray(y_train)
X_test = np.asarray(X_test)
y_test = np.asarray(y_test)
I experimented with the following shapes for the input data
print(X_train.shape) #(631, 224, 167)
print(X_test.shape) #(99, 224, 167)
print(y_train.shape) #(631, 1)
print(np.unique(y_train)) #[0. 1.]
y_train_cat=to_categorical(y_train)
print(y_train_cat.shape) #(631, 2)
Both categorical and binary models produce nans in prediction, and the training is clearly wrong. It must be something obvious that i'm missing (I suspected problems in the training periods =224, i.e. 225-1 or units=2 in the last layer). I tried different shapes and combinations , but failed and will greatly appreciate any clue.
model=Sequential([
LSTM(units=100,
input_shape=(periods_to_train,features), kernel_initializer='he_uniform',
activation ='linear', kernel_constraint=maxnorm(3.), return_sequences=False),
Dropout(rate=0.5),
Dense(units=100, kernel_initializer='he_uniform',
activation='linear', kernel_constraint=maxnorm(3)),
Dropout(rate=0.5),
Dense(units=100, kernel_initializer='he_uniform',
activation='linear', kernel_constraint=maxnorm(3)),
Dropout(rate=0.5),
Dense(units=1, kernel_initializer='he_uniform', activation='sigmoid')])
# Compile model
optimizer = Adamax(lr=0.001, decay=0.1)
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
configure(gpu_ind=True)
model.fit(X_train, y_train, validation_split=0.1, batch_size=100, epochs=8, shuffle=True)
_________________________________________________________________
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (None, 100) 107200
_________________________________________________________________
dropout_1 (Dropout) (None, 100) 0
_________________________________________________________________
dense_1 (Dense) (None, 100) 10100
_________________________________________________________________
dropout_2 (Dropout) (None, 100) 0
_________________________________________________________________
dense_2 (Dense) (None, 100) 10100
_________________________________________________________________
dropout_3 (Dropout) (None, 100) 0
_________________________________________________________________
dense_3 (Dense) (None, 1) 101
=================================================================
Total params: 127,501
Trainable params: 127,501
Non-trainable params: 0
_________________________________________________________________
This is my predicted array,
y_hat_val = model.predict(X_test)
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
Thanks for the help!
After running simulations, I found that maximal possible time_steps (m) that does not result in nans for the matrix of this shape - is m=163.
After that correction the model produces meaningful predictions.
Another issue to look at is the preparation of input train set.
If return_sequences argument is used, train set
should include the actual N of time_steps and not N-1, as in the example.
Below is how the train set can be transformed
X_train_minmax = min_max_scaler.fit_transform(X_train) ##all features included
X_train_reshaped = X_train_minmax.reshape(seq_samples,time_steps,features)
I want to implement a word2vec using Keras. This is how I prepared my training data:
encoded = tokenizer.texts_to_sequences(data)
sequences = list()
for i in range(1, len(encoded)):
sent = encoded[i]
_4grams = list(nltk.ngrams(sent, n=4))
for gram in _4grams:
sequences.append(gram)
# split into X and y elements
sequences = np.array(sequences)
X, y = sequences[:, 0:3], sequences[:, 3]
X = to_categorical(X, num_classes=vocab_size)
y = to_categorical(y, num_classes=vocab_size)
Xtrain, Xtest, Ytrain, Ytest = train_test_split(X, y, test_size=0.3, random_state=42)
The following is my model in Keras:
model = Sequential()
model.add(Dense(50, input_shape=Xtrain.shape))
model.add(Dense(Ytrain.shape[1]))
model.add(Activation("softmax"))
Xtrain (6960, 3, 4048)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_22 (Dense) (None, 6960, 3, 50) 202450
_________________________________________________________________
dense_23 (Dense) (None, 6960, 3, 4048) 206448
_________________________________________________________________
activation_10 (Activation) (None, 6960, 3, 4048) 0
=================================================================
Total params: 408,898
Trainable params: 408,898
Non-trainable params: 0
_________________________________________________________________
None
I got the error:
history = model.fit(Xtrain, Ytrain, epochs=10, verbose=1, validation_data=(Xtest, Ytest))
Error when checking input: expected dense_22_input to have 4 dimensions, but got array with shape (6960, 3, 4048)
I'm confused on how to prepare and feed my training data to a Keras neural network?
Input shape in keras does not necessary imply the shape of the input dataset. Input shape is shape of a single data point in the dataset(shape of input dataset without batch dimension). But you are are specifying the input shape to be shape of the dataset input including the batch dimension. The correct input shape in your case would be Xtrain.shape[1:].
model = Sequential()
model.add(Dense(50, input_shape=Xtrain.shape[1:]))
model.add(Dense(Ytrain.shape[1]))
model.add(Activation("softmax"))
I am trying to replicate the network from:
https://arxiv.org/pdf/1604.07176.pdf
I see their implementation in
https://github.com/wentaozhu/protein-cascade-cnn-lstm/blob/master/cb6133.py
I am trying to train on the Q8 task only, with no solvent task. I am trying to concatenate the [512, 50] embedding layer to the [512, 22] output layer, but I keep getting various errors. This is how I am trying to concatenate at the moment:
main_input = Input(shape=(maxlen_seq,), dtype='int32', name='main_input')
# Defining an embedding layer mapping from the words (n_words) to a vector of len 50
# input_orig = K.reshape(input, (maxlen_seq, n_words))
x = Embedding(input_dim=n_words, output_dim=50, input_length=maxlen_seq)(main_input)
aux_input = Input(shape=(maxlen_seq, n_words), name='aux_input')
x = concatenate([x, aux_input], axis=-1)
# ... rest of model ...
The model compiles fine with model.compile():
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
main_input (InputLayer) (None, 512) 0
__________________________________________________________________________________________________
embedding_27 (Embedding) (None, 512, 50) 1100 main_input[0][0]
__________________________________________________________________________________________________
aux_input (InputLayer) (None, 512, 22) 0
__________________________________________________________________________________________________
concatenate_54 (Concatenate) (None, 512, 72) 0 embedding_27[0][0]
aux_input[0][0]
________________________________________________________________________________________
...
But I get a ValueError: Error when checking input: expected aux_input to have 3 dimensions, but got array with shape (4464, 512)
The model is defined as:
model = Model([main_input, aux_input], [y1, y2])
And fit as:
model.fit({'main_input':X_train,
'aux_input': X_train},
{'main_output': y_train,
'aux_output': y_train},
batch_size=128, epochs=20, callbacks=[early, best_model],
validation_data=({'main_input':X_val,
'aux_input': X_val},
{'main_output': y_val,
'aux_output': y_val}),
verbose=1)