How to work with 3 dimensional word input in Keras - python

I want to implement a word2vec using Keras. This is how I prepared my training data:
encoded = tokenizer.texts_to_sequences(data)
sequences = list()
for i in range(1, len(encoded)):
sent = encoded[i]
_4grams = list(nltk.ngrams(sent, n=4))
for gram in _4grams:
sequences.append(gram)
# split into X and y elements
sequences = np.array(sequences)
X, y = sequences[:, 0:3], sequences[:, 3]
X = to_categorical(X, num_classes=vocab_size)
y = to_categorical(y, num_classes=vocab_size)
Xtrain, Xtest, Ytrain, Ytest = train_test_split(X, y, test_size=0.3, random_state=42)
The following is my model in Keras:
model = Sequential()
model.add(Dense(50, input_shape=Xtrain.shape))
model.add(Dense(Ytrain.shape[1]))
model.add(Activation("softmax"))
Xtrain (6960, 3, 4048)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_22 (Dense) (None, 6960, 3, 50) 202450
_________________________________________________________________
dense_23 (Dense) (None, 6960, 3, 4048) 206448
_________________________________________________________________
activation_10 (Activation) (None, 6960, 3, 4048) 0
=================================================================
Total params: 408,898
Trainable params: 408,898
Non-trainable params: 0
_________________________________________________________________
None
I got the error:
history = model.fit(Xtrain, Ytrain, epochs=10, verbose=1, validation_data=(Xtest, Ytest))
Error when checking input: expected dense_22_input to have 4 dimensions, but got array with shape (6960, 3, 4048)
I'm confused on how to prepare and feed my training data to a Keras neural network?

Input shape in keras does not necessary imply the shape of the input dataset. Input shape is shape of a single data point in the dataset(shape of input dataset without batch dimension). But you are are specifying the input shape to be shape of the dataset input including the batch dimension. The correct input shape in your case would be Xtrain.shape[1:].
model = Sequential()
model.add(Dense(50, input_shape=Xtrain.shape[1:]))
model.add(Dense(Ytrain.shape[1]))
model.add(Activation("softmax"))

Related

How do i correctly shape my input data for a keras model?

I'm currently working on a Keras neural network for fun. I'm just learning the basics, but cant get over this dimension problem:
So my input data (X) should be a 12x6 matrix, with 12 timestamps and 6 different data values for every timestamp:
X = np.zeros([2867, 12, 6])
Y = np.zeros([2867, 3])
My Output (Y) should be a one-hot encoded 3x1 vector.
Now i want to feed this data through the following LSTM model.
model = Sequential()
model.add(LSTM(30, activation="softsign", return_sequences=True, input_shape=(12, 6)))
model.add(Dense(3))
model.summary()
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x=X, y=Y, batch_size=100, epochs=1000, verbose=2, validation_split=0.2)
The Summary looks like this:
Model: "sequential"
Layer (type) Output Shape Param #
=================================================================
lstm (LSTM) (None, 12, 30) 4440
_________________________________________________________________
dense (Dense) (None, 12, 3) 93
=================================================================
Total params: 4,533
Trainable params: 4,533
Non-trainable params: 0
_________________________________________________________________
When i run this program, i get this error:
ValueError: Shapes (None, 3) and (None, 12, 3) are incompatible.
I already tried to reshape my data to a 72x1 vector, but this doesnt work either.
Maybe someone can help me how to shape my input data correctly :).
You probably need to define your model as follows as you used the categorical_crossentropy loss function.
model.add(LSTM(30, activation="softsign",
return_sequences=False, input_shape=(12, 6)))
model.add(Dense(3, activations='softmax'))

ValueError: Shape mismatch: The shape of labels (received (1,)) should equal the shape of logits except for the last dimension (received (10, 30))

i'm fairly new to tensorflow and would appreciate answers a lot.
i'm trying to use a transformer model as an embedding layer and feed the data to a custom model.
from transformers import TFAutoModel
from tensorflow.keras import layers
def build_model():
transformer_model = TFAutoModel.from_pretrained(MODEL_NAME, config=config)
input_ids_in = layers.Input(shape=(MAX_LEN,), name='input_ids', dtype='int32')
input_masks_in = layers.Input(shape=(MAX_LEN,), name='attention_mask', dtype='int32')
embedding_layer = transformer_model(input_ids_in, attention_mask=input_masks_in)[0]
X = layers.Bidirectional(tf.keras.layers.LSTM(50, return_sequences=True, dropout=0.1, recurrent_dropout=0.1))(embedding_layer)
X = layers.GlobalMaxPool1D()(X)
X = layers.Dense(64, activation='relu')(X)
X = layers.Dropout(0.2)(X)
X = layers.Dense(30, activation='softmax')(X)
model = tf.keras.Model(inputs=[input_ids_in, input_masks_in], outputs = X)
for layer in model.layers[:3]:
layer.trainable = False
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
return model
model = build_model()
model.summary()
r = model.fit(
train_ds,
steps_per_epoch=train_steps,
epochs=EPOCHS,
verbose=3)
I have 30 classes and the labels are not one-hot encoded so im using sparse_categorical_crossentropy as my loss function but i keep getting the following error
ValueError: Shape mismatch: The shape of labels (received (1,)) should equal the shape of logits except for the last dimension (received (10, 30)).
how can i solve this?
and why is the (10, 30) shape required? i know 30 is because of the last Dense layer with 30 units but why the 10? is it because of the MAX_LENGTH which is 10?
my model summary:
Model: "model_16"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_ids (InputLayer) [(None, 10)] 0
__________________________________________________________________________________________________
attention_mask (InputLayer) [(None, 10)] 0
__________________________________________________________________________________________________
tf_bert_model_21 (TFBertModel) TFBaseModelOutputWit 162841344 input_ids[0][0]
attention_mask[0][0]
__________________________________________________________________________________________________
bidirectional_17 (Bidirectional (None, 10, 100) 327600 tf_bert_model_21[0][0]
__________________________________________________________________________________________________
global_max_pooling1d_15 (Global (None, 100) 0 bidirectional_17[0][0]
__________________________________________________________________________________________________
dense_32 (Dense) (None, 64) 6464 global_max_pooling1d_15[0][0]
__________________________________________________________________________________________________
dropout_867 (Dropout) (None, 64) 0 dense_32[0][0]
__________________________________________________________________________________________________
dense_33 (Dense) (None, 30) 1950 dropout_867[0][0]
==================================================================================================
Total params: 163,177,358
Trainable params: 336,014
Non-trainable params: 162,841,344
10 is a number of sequences in one batch. I suspect that it is a number of sequences in your dataset.
Your model acting as a sequence classifier. So you should have one label for every sequence.

LSTM input produces nans in non-text classification problem

LSTM model on non-text data is trained to classify two -classes.
I have 225 time points for each product (N=730), with 167 features including the target. Only the last time point is to be predicted.
I use the target as a feature in predictions: here is how I prepare the input:
def split_sequences(sequences, n_steps, n_steps_out):
X, y = list(), list()
for i in range(n_steps_out):
# gather input and output parts of the pattern
y.append(sequences[n_steps + i:n_steps + i + 1, -1][0])
#targ = sequences[n_steps + i:n_steps + i + 1, -1][0]
#y.append(int(targ)) if ((targ==0) | (targ==1)) else y.append(2)
X.append(sequences[:n_steps, :])
return np.asarray(X).reshape(n_steps, sequences.shape[1]), np.asarray(y).reshape(n_steps_out)
#del X_train_minmax, X_test_minmax
min_max_scaler = preprocessing.MinMaxScaler(feature_range=(0, 1))
#X_train_minmax = min_max_scaler.fit_transform(X_train.iloc[:, 0:166])
#X_test_minmax = min_max_scaler.fit_transform(X_test.iloc[:, 0:166])
X_train_minmax = min_max_scaler.fit_transform(X_train) ##all features included
X_test_minmax = min_max_scaler.fit_transform(X_test)
print(X_train_minmax.shape)
print(X_test_minmax.shape)
seq_samples = 631
seq_samples2 = 99
time_steps = 225
periods_to_predict = 1
periods_to_train = time_steps - periods_to_predict ##here may be a problem
#
features = 167
X_train_reshaped = X_train_minmax.reshape(seq_samples,time_steps,features)
X_test_reshaped = X_test_minmax.reshape(seq_samples2, time_steps,features)
data_train = [split_sequences(x, periods_to_train , periods_to_predict) for x in X_train_reshaped] ##and here i shoud check the function
data_test = [split_sequences(x, periods_to_train , periods_to_predict) for x in X_test_reshaped]
X_train, y_train, X_test, y_test = [], [], [], []
for x in data_train:
X_train.append(x[0])
y_train.append(x[1])
for x in data_test:
X_test.append(x[0])
y_test.append(x[1])
X_train = np.asarray(X_train)
y_train = np.asarray(y_train)
X_test = np.asarray(X_test)
y_test = np.asarray(y_test)
I experimented with the following shapes for the input data
print(X_train.shape) #(631, 224, 167)
print(X_test.shape) #(99, 224, 167)
print(y_train.shape) #(631, 1)
print(np.unique(y_train)) #[0. 1.]
y_train_cat=to_categorical(y_train)
print(y_train_cat.shape) #(631, 2)
Both categorical and binary models produce nans in prediction, and the training is clearly wrong. It must be something obvious that i'm missing (I suspected problems in the training periods =224, i.e. 225-1 or units=2 in the last layer). I tried different shapes and combinations , but failed and will greatly appreciate any clue.
model=Sequential([
LSTM(units=100,
input_shape=(periods_to_train,features), kernel_initializer='he_uniform',
activation ='linear', kernel_constraint=maxnorm(3.), return_sequences=False),
Dropout(rate=0.5),
Dense(units=100, kernel_initializer='he_uniform',
activation='linear', kernel_constraint=maxnorm(3)),
Dropout(rate=0.5),
Dense(units=100, kernel_initializer='he_uniform',
activation='linear', kernel_constraint=maxnorm(3)),
Dropout(rate=0.5),
Dense(units=1, kernel_initializer='he_uniform', activation='sigmoid')])
# Compile model
optimizer = Adamax(lr=0.001, decay=0.1)
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
configure(gpu_ind=True)
model.fit(X_train, y_train, validation_split=0.1, batch_size=100, epochs=8, shuffle=True)
_________________________________________________________________
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (None, 100) 107200
_________________________________________________________________
dropout_1 (Dropout) (None, 100) 0
_________________________________________________________________
dense_1 (Dense) (None, 100) 10100
_________________________________________________________________
dropout_2 (Dropout) (None, 100) 0
_________________________________________________________________
dense_2 (Dense) (None, 100) 10100
_________________________________________________________________
dropout_3 (Dropout) (None, 100) 0
_________________________________________________________________
dense_3 (Dense) (None, 1) 101
=================================================================
Total params: 127,501
Trainable params: 127,501
Non-trainable params: 0
_________________________________________________________________
This is my predicted array,
y_hat_val = model.predict(X_test)
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
Thanks for the help!
After running simulations, I found that maximal possible time_steps (m) that does not result in nans for the matrix of this shape - is m=163.
After that correction the model produces meaningful predictions.
Another issue to look at is the preparation of input train set.
If return_sequences argument is used, train set
should include the actual N of time_steps and not N-1, as in the example.
Below is how the train set can be transformed
X_train_minmax = min_max_scaler.fit_transform(X_train) ##all features included
X_train_reshaped = X_train_minmax.reshape(seq_samples,time_steps,features)

Keras: ValueError: Error when checking target: expected dense_2 to have shape (10,) but got array with shape (1,)

I'm new to deep learning and Keras. When I used Keras to fit a LSTM model, I got the following error message: ValueError: Error when checking target: expected dense_2 to have shape (10,) but got array with shape (1,)
Here are my code for building LSTM:
def build(self, embedding_matrix, dim, num_class, vocab_size, maxlen):
model = Sequential()
model.add(Embedding(vocab_size, dim, weights = [embedding_matrix],
input_length = maxlen, trainable = False)) ## pre-trained model
model.add(LSTM(dim))
model.add(Dense(dim, activation = "relu"))
model.add(Dense(num_class, activation = "softmax"))
self.model = model
Before this post, I tried several solutions metioned in other SO posts. For example, use to_categorical to transform labels, use Flatten before the final layer. Sadly none of them worked.
Here are my log file for running the script:
Start to fit GLOVE with reported data
Applying GLOVE pre-trained model
Number of unique tokens: 308758
Pre-trained model finished
Applying keras text pre-processing
Finish to apply keras text pre-processing
Start to fit the model...
Build LSTM model...
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, 36303, 300) 92627400
_________________________________________________________________
lstm_1 (LSTM) (None, 300) 721200
_________________________________________________________________
dense_1 (Dense) (None, 300) 90300
_________________________________________________________________
dense_2 (Dense) (None, 10) 3010
=================================================================
Total params: 93,441,910
Trainable params: 814,510
Non-trainable params: 92,627,400
_________________________________________________________________
None
Finish model building
It went well untill history = self.model.fit(train_padded, y_train, epochs = 10, batch_size = 128, validation_split = 0.2), then I got the above error.
I run out of solutions. Any help will be appriciate!
Edit:
About the y_train, here is the code I use for building y_train:
labels = dt["category"].values
num_class = len(np.unique(labels))
classes = np.unique(labels)
le = LabelEncoder()
y = le.fit_transform(labels)
y = to_categorical(y, num_class)
## split to training and test set
x_train, y_train, x_test, y_test = train_test_split(text, y, test_size = 0.33,
random_state = 42,
stratify = dt["category"].astype("str"))
Another update: here are the shapes.
The shape of y_train: (48334,)
The shape of x_train: (98132,)
The shape of y_test: (48334, 10)
The shape of x_test: (98132, 10)
The problem is that you are getting your x_train, y_train, x_test, y_test in the wrong order, so it is assigning things incorrectly. train_test_split returns the following:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33,random_state=42)
whereas you have x_train, y_train, x_test, y_test
train_test_split docs

Keras dense layer shape mismatch

I am trying to make a multiclass classifier in Keras, but I am getting a dimension mismatch in the Dense layer.
MAX_SENT_LENGTH = 100
MAX_SENTS = 15
EMBEDDING_DIM = 100
x_train = data[:-nb_validation_samples]
y_train = labels[:-nb_validation_samples]
x_val = data[-nb_validation_samples:]
y_val = labels[-nb_validation_samples:]
embedding_layer = Embedding(len(word_index) + 1,
EMBEDDING_DIM,
weights=[embedding_matrix],
input_length=MAX_SENT_LENGTH,
trainable=True)
sentence_input = Input(shape=(MAX_SENT_LENGTH,), dtype='int32')
embedded_sequences = embedding_layer(sentence_input)
l_lstm = Bidirectional(LSTM(100))(embedded_sequences)
sentEncoder = Model(sentence_input, l_lstm)
review_input = Input(shape=(MAX_SENTS,MAX_SENT_LENGTH), dtype='int32')
review_encoder = TimeDistributed(sentEncoder)(review_input)
l_lstm_sent = Bidirectional(LSTM(100))(review_encoder)
preds = Dense(7, activation='softmax')(l_lstm_sent)
model = Model(review_input, preds)
model.compile(loss='sparse_categorical_crossentropy',
optimizer='rmsprop',
metrics=['acc'])
model.fit(x_train, y_train, validation_data=(x_val, y_val),
epochs=10, batch_size=50)
The class labels are transformed into a 1-hot vector correctly, but when trying to fit the model, I am getting this mismatch error:
('Shape of data tensor:', (5327, 15, 100))
('Shape of label tensor:', (5327, 7))
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) (None, 15, 100) 0
_________________________________________________________________
time_distributed_1 (TimeDist (None, 15, 200) 351500
_________________________________________________________________
bidirectional_2 (Bidirection (None, 200) 240800
_________________________________________________________________
dense_1 (Dense) (None, 7) 1407
=================================================================
Total params: 592,501
Trainable params: 592,501
Non-trainable params: 0
_________________________________________________________________
None
ValueError: Error when checking target: expected dense_1 to have
shape (None, 1) but got array with shape (4262, 7)
Where does this (None, 1) dimension come from and how can I solve this error?
You should use loss='categorical_crossentropy' instead of loss='sparse_categorical_crossentropy' if your label is one-hot encoded. 'sparse_categorical_crossentropy' takes integer labels, and that's why (None,1) dimension is required.

Categories