LSTM input produces nans in non-text classification problem - python

LSTM model on non-text data is trained to classify two -classes.
I have 225 time points for each product (N=730), with 167 features including the target. Only the last time point is to be predicted.
I use the target as a feature in predictions: here is how I prepare the input:
def split_sequences(sequences, n_steps, n_steps_out):
X, y = list(), list()
for i in range(n_steps_out):
# gather input and output parts of the pattern
y.append(sequences[n_steps + i:n_steps + i + 1, -1][0])
#targ = sequences[n_steps + i:n_steps + i + 1, -1][0]
#y.append(int(targ)) if ((targ==0) | (targ==1)) else y.append(2)
X.append(sequences[:n_steps, :])
return np.asarray(X).reshape(n_steps, sequences.shape[1]), np.asarray(y).reshape(n_steps_out)
#del X_train_minmax, X_test_minmax
min_max_scaler = preprocessing.MinMaxScaler(feature_range=(0, 1))
#X_train_minmax = min_max_scaler.fit_transform(X_train.iloc[:, 0:166])
#X_test_minmax = min_max_scaler.fit_transform(X_test.iloc[:, 0:166])
X_train_minmax = min_max_scaler.fit_transform(X_train) ##all features included
X_test_minmax = min_max_scaler.fit_transform(X_test)
print(X_train_minmax.shape)
print(X_test_minmax.shape)
seq_samples = 631
seq_samples2 = 99
time_steps = 225
periods_to_predict = 1
periods_to_train = time_steps - periods_to_predict ##here may be a problem
#
features = 167
X_train_reshaped = X_train_minmax.reshape(seq_samples,time_steps,features)
X_test_reshaped = X_test_minmax.reshape(seq_samples2, time_steps,features)
data_train = [split_sequences(x, periods_to_train , periods_to_predict) for x in X_train_reshaped] ##and here i shoud check the function
data_test = [split_sequences(x, periods_to_train , periods_to_predict) for x in X_test_reshaped]
X_train, y_train, X_test, y_test = [], [], [], []
for x in data_train:
X_train.append(x[0])
y_train.append(x[1])
for x in data_test:
X_test.append(x[0])
y_test.append(x[1])
X_train = np.asarray(X_train)
y_train = np.asarray(y_train)
X_test = np.asarray(X_test)
y_test = np.asarray(y_test)
I experimented with the following shapes for the input data
print(X_train.shape) #(631, 224, 167)
print(X_test.shape) #(99, 224, 167)
print(y_train.shape) #(631, 1)
print(np.unique(y_train)) #[0. 1.]
y_train_cat=to_categorical(y_train)
print(y_train_cat.shape) #(631, 2)
Both categorical and binary models produce nans in prediction, and the training is clearly wrong. It must be something obvious that i'm missing (I suspected problems in the training periods =224, i.e. 225-1 or units=2 in the last layer). I tried different shapes and combinations , but failed and will greatly appreciate any clue.
model=Sequential([
LSTM(units=100,
input_shape=(periods_to_train,features), kernel_initializer='he_uniform',
activation ='linear', kernel_constraint=maxnorm(3.), return_sequences=False),
Dropout(rate=0.5),
Dense(units=100, kernel_initializer='he_uniform',
activation='linear', kernel_constraint=maxnorm(3)),
Dropout(rate=0.5),
Dense(units=100, kernel_initializer='he_uniform',
activation='linear', kernel_constraint=maxnorm(3)),
Dropout(rate=0.5),
Dense(units=1, kernel_initializer='he_uniform', activation='sigmoid')])
# Compile model
optimizer = Adamax(lr=0.001, decay=0.1)
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
configure(gpu_ind=True)
model.fit(X_train, y_train, validation_split=0.1, batch_size=100, epochs=8, shuffle=True)
_________________________________________________________________
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (None, 100) 107200
_________________________________________________________________
dropout_1 (Dropout) (None, 100) 0
_________________________________________________________________
dense_1 (Dense) (None, 100) 10100
_________________________________________________________________
dropout_2 (Dropout) (None, 100) 0
_________________________________________________________________
dense_2 (Dense) (None, 100) 10100
_________________________________________________________________
dropout_3 (Dropout) (None, 100) 0
_________________________________________________________________
dense_3 (Dense) (None, 1) 101
=================================================================
Total params: 127,501
Trainable params: 127,501
Non-trainable params: 0
_________________________________________________________________
This is my predicted array,
y_hat_val = model.predict(X_test)
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
Thanks for the help!

After running simulations, I found that maximal possible time_steps (m) that does not result in nans for the matrix of this shape - is m=163.
After that correction the model produces meaningful predictions.
Another issue to look at is the preparation of input train set.
If return_sequences argument is used, train set
should include the actual N of time_steps and not N-1, as in the example.
Below is how the train set can be transformed
X_train_minmax = min_max_scaler.fit_transform(X_train) ##all features included
X_train_reshaped = X_train_minmax.reshape(seq_samples,time_steps,features)

Related

ValueError: Shape mismatch: The shape of labels (received (1,)) should equal the shape of logits except for the last dimension (received (10, 30))

i'm fairly new to tensorflow and would appreciate answers a lot.
i'm trying to use a transformer model as an embedding layer and feed the data to a custom model.
from transformers import TFAutoModel
from tensorflow.keras import layers
def build_model():
transformer_model = TFAutoModel.from_pretrained(MODEL_NAME, config=config)
input_ids_in = layers.Input(shape=(MAX_LEN,), name='input_ids', dtype='int32')
input_masks_in = layers.Input(shape=(MAX_LEN,), name='attention_mask', dtype='int32')
embedding_layer = transformer_model(input_ids_in, attention_mask=input_masks_in)[0]
X = layers.Bidirectional(tf.keras.layers.LSTM(50, return_sequences=True, dropout=0.1, recurrent_dropout=0.1))(embedding_layer)
X = layers.GlobalMaxPool1D()(X)
X = layers.Dense(64, activation='relu')(X)
X = layers.Dropout(0.2)(X)
X = layers.Dense(30, activation='softmax')(X)
model = tf.keras.Model(inputs=[input_ids_in, input_masks_in], outputs = X)
for layer in model.layers[:3]:
layer.trainable = False
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
return model
model = build_model()
model.summary()
r = model.fit(
train_ds,
steps_per_epoch=train_steps,
epochs=EPOCHS,
verbose=3)
I have 30 classes and the labels are not one-hot encoded so im using sparse_categorical_crossentropy as my loss function but i keep getting the following error
ValueError: Shape mismatch: The shape of labels (received (1,)) should equal the shape of logits except for the last dimension (received (10, 30)).
how can i solve this?
and why is the (10, 30) shape required? i know 30 is because of the last Dense layer with 30 units but why the 10? is it because of the MAX_LENGTH which is 10?
my model summary:
Model: "model_16"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_ids (InputLayer) [(None, 10)] 0
__________________________________________________________________________________________________
attention_mask (InputLayer) [(None, 10)] 0
__________________________________________________________________________________________________
tf_bert_model_21 (TFBertModel) TFBaseModelOutputWit 162841344 input_ids[0][0]
attention_mask[0][0]
__________________________________________________________________________________________________
bidirectional_17 (Bidirectional (None, 10, 100) 327600 tf_bert_model_21[0][0]
__________________________________________________________________________________________________
global_max_pooling1d_15 (Global (None, 100) 0 bidirectional_17[0][0]
__________________________________________________________________________________________________
dense_32 (Dense) (None, 64) 6464 global_max_pooling1d_15[0][0]
__________________________________________________________________________________________________
dropout_867 (Dropout) (None, 64) 0 dense_32[0][0]
__________________________________________________________________________________________________
dense_33 (Dense) (None, 30) 1950 dropout_867[0][0]
==================================================================================================
Total params: 163,177,358
Trainable params: 336,014
Non-trainable params: 162,841,344
10 is a number of sequences in one batch. I suspect that it is a number of sequences in your dataset.
Your model acting as a sequence classifier. So you should have one label for every sequence.

Keras, simple model, shapes incompatible, but why?

I'm working on a first model in Keras, just to get started. I downloaded the MNIST data from Kaggle, which has preprocessed a subset of the data into a csv with a label column and 784 (28x28) greyscale pixel values.
I'm getting a ValueError: Shapes (None, 1) and (None, 10) are incompatible. I can't understand why.
Here's my code:
## load csv and partition into train/dev sets
dataframe = pd.read_csv('data/train.csv') ##load as pandas
dev_df = dataframe.sample(n=3000, random_state=1)
train_df = dataframe.drop(dev_df.index)
assert train_df.shape[1] == 785 #make sure it's not a problem with the data shape.
##Build the model
inputs = keras.Input(shape=(784))
x = layers.experimental.preprocessing.Rescaling(1./255)(inputs)
x = layers.Dense(100, activation='relu')(x)
x = layers.Dense(100, activation='relu')(x)
outputs = layers.Dense(10, activation='softmax')(x)
model = keras.Model(inputs=inputs,outputs=outputs)
model.summary()
At this point, I get the following model summary, which looks right to me (i.e., it looks a lot like what I'm seeing in the guides that I'm following.)
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) [(None, 784)] 0
_________________________________________________________________
rescaling_6 (Rescaling) (None, 784) 0
_________________________________________________________________
dense_3 (Dense) (None, 100) 78500
_________________________________________________________________
dense_4 (Dense) (None, 100) 10100
_________________________________________________________________
dense_5 (Dense) (None, 10) 1010
=================================================================
Total params: 89,610
Trainable params: 89,610
Non-trainable params: 0
Then
## compile and fit model.
model.compile(optimizer="adam", loss="categorical_crossentropy")
(X, y) = (train_df.drop('label', axis=1).to_numpy(), train_df['label'].to_numpy())
assert X.shape[1] == 784 ## we're really sure this is the right size.
assert X.shape[0] == y.shape[0] ## the number of labels matches the number of samples
history = model.fit(X, y, epochs=1, batch_size=64)
The last line raises the error. I would guess it emerges with the final dense layer, since it has the expected (None,10) shape, but I can't figure out where the (None, 1) shaped entity (whatever it is) comes from.

How to work with 3 dimensional word input in Keras

I want to implement a word2vec using Keras. This is how I prepared my training data:
encoded = tokenizer.texts_to_sequences(data)
sequences = list()
for i in range(1, len(encoded)):
sent = encoded[i]
_4grams = list(nltk.ngrams(sent, n=4))
for gram in _4grams:
sequences.append(gram)
# split into X and y elements
sequences = np.array(sequences)
X, y = sequences[:, 0:3], sequences[:, 3]
X = to_categorical(X, num_classes=vocab_size)
y = to_categorical(y, num_classes=vocab_size)
Xtrain, Xtest, Ytrain, Ytest = train_test_split(X, y, test_size=0.3, random_state=42)
The following is my model in Keras:
model = Sequential()
model.add(Dense(50, input_shape=Xtrain.shape))
model.add(Dense(Ytrain.shape[1]))
model.add(Activation("softmax"))
Xtrain (6960, 3, 4048)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_22 (Dense) (None, 6960, 3, 50) 202450
_________________________________________________________________
dense_23 (Dense) (None, 6960, 3, 4048) 206448
_________________________________________________________________
activation_10 (Activation) (None, 6960, 3, 4048) 0
=================================================================
Total params: 408,898
Trainable params: 408,898
Non-trainable params: 0
_________________________________________________________________
None
I got the error:
history = model.fit(Xtrain, Ytrain, epochs=10, verbose=1, validation_data=(Xtest, Ytest))
Error when checking input: expected dense_22_input to have 4 dimensions, but got array with shape (6960, 3, 4048)
I'm confused on how to prepare and feed my training data to a Keras neural network?
Input shape in keras does not necessary imply the shape of the input dataset. Input shape is shape of a single data point in the dataset(shape of input dataset without batch dimension). But you are are specifying the input shape to be shape of the dataset input including the batch dimension. The correct input shape in your case would be Xtrain.shape[1:].
model = Sequential()
model.add(Dense(50, input_shape=Xtrain.shape[1:]))
model.add(Dense(Ytrain.shape[1]))
model.add(Activation("softmax"))

Keras dense layer shape mismatch

I am trying to make a multiclass classifier in Keras, but I am getting a dimension mismatch in the Dense layer.
MAX_SENT_LENGTH = 100
MAX_SENTS = 15
EMBEDDING_DIM = 100
x_train = data[:-nb_validation_samples]
y_train = labels[:-nb_validation_samples]
x_val = data[-nb_validation_samples:]
y_val = labels[-nb_validation_samples:]
embedding_layer = Embedding(len(word_index) + 1,
EMBEDDING_DIM,
weights=[embedding_matrix],
input_length=MAX_SENT_LENGTH,
trainable=True)
sentence_input = Input(shape=(MAX_SENT_LENGTH,), dtype='int32')
embedded_sequences = embedding_layer(sentence_input)
l_lstm = Bidirectional(LSTM(100))(embedded_sequences)
sentEncoder = Model(sentence_input, l_lstm)
review_input = Input(shape=(MAX_SENTS,MAX_SENT_LENGTH), dtype='int32')
review_encoder = TimeDistributed(sentEncoder)(review_input)
l_lstm_sent = Bidirectional(LSTM(100))(review_encoder)
preds = Dense(7, activation='softmax')(l_lstm_sent)
model = Model(review_input, preds)
model.compile(loss='sparse_categorical_crossentropy',
optimizer='rmsprop',
metrics=['acc'])
model.fit(x_train, y_train, validation_data=(x_val, y_val),
epochs=10, batch_size=50)
The class labels are transformed into a 1-hot vector correctly, but when trying to fit the model, I am getting this mismatch error:
('Shape of data tensor:', (5327, 15, 100))
('Shape of label tensor:', (5327, 7))
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) (None, 15, 100) 0
_________________________________________________________________
time_distributed_1 (TimeDist (None, 15, 200) 351500
_________________________________________________________________
bidirectional_2 (Bidirection (None, 200) 240800
_________________________________________________________________
dense_1 (Dense) (None, 7) 1407
=================================================================
Total params: 592,501
Trainable params: 592,501
Non-trainable params: 0
_________________________________________________________________
None
ValueError: Error when checking target: expected dense_1 to have
shape (None, 1) but got array with shape (4262, 7)
Where does this (None, 1) dimension come from and how can I solve this error?
You should use loss='categorical_crossentropy' instead of loss='sparse_categorical_crossentropy' if your label is one-hot encoded. 'sparse_categorical_crossentropy' takes integer labels, and that's why (None,1) dimension is required.

My Keras Conv1D model doesn't recognize a substring, and instead produces the same value for every input

I want to use a Con1D model to categorize DNA sequences, and as a starting point, I'd like to build a model that recognizes a specific sub-sequence in randomly generated DNA sequences.
Test Data
DNA consists of a sequence of one of 4 possible bases, A, T, G, and C. I represent my DNA in "one hot" encoding -- a sequence consisting of a single base, A, would therefore be represented as [1,0,0,0] -- yielding a matrix with dimensions SAMPLE_CT x SEQ_LENGTH x 4. For my test set, I generate a number of random sequences, and in a fraction of those sequences, I overwrite a portion of the sequence with a fixed "signal" sequence. Sequences are scored based on whether the signal is present or not -- 1 for sequences containing the signal, 0 for sequences lacking it:
SIGNAL_LEN = 80
COUNT, LENGTH, FRAC_PRESENT = 20000, 5000, 0.3
X = np.zeros((COUNT, LENGTH, 4), dtype='uint8')
Y = np.zeros((COUNT,), dtype='float16')
# random sub-sequence:
signal = [random.choice([0,1,2,3]) for _ in range(SIGNAL_LEN)]
for seq_i in range(COUNT):
for idx in range(LENGTH):
# flip random index to 1
X[ seq_i, idx, random.choice([0,1,2,3]) ] = 1
if random.random() < FRAC_PRESENT:
start_pos = random.randint(100, LENGTH - (100 + SIGNAL_LEN))
for idx in range(SIGNAL_LEN):
X[ seq_i, start_pos + idx, : ] = 0
X[ seq_i, start_pos + idx, signal[idx] ] = 1
# a score of 1.0 means present; 0.0 means absent
Y[ seq_id ] = 1.0
Keras Model
I cribbed my model from examples I found online. The output is the score described earlier, 1 for present and 0 for absent.
WINDOW = 20
sample_ct, width, depth = X.shape
# define base model
def baseline_model():
model = Sequential()
model.add( Conv1D(filters=256, kernel_size=20,
input_shape=(width, depth),
kernel_initializer= 'uniform',
activation= 'relu') )
model.add(Activation('sigmoid'))
model.add(MaxPooling1D(pool_length=4))
model.add( Conv1D(filters=128, kernel_size=20) )
model.add(Activation('sigmoid'))
model.add(MaxPooling1D(pool_length=4))
model.add( Conv1D(1, kernel_size=20,
kernel_initializer= 'uniform',
activation= 'relu') )
model.add( Flatten() )
model.add( Dense(128, kernel_initializer='normal', activation='relu') )
model.add( Dense(1, activation='sigmoid', name='output') )
model.compile(loss='mean_squared_error', optimizer=Adam(lr=1e04))
model.summary()
return model
The model.summary looks like this:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_1 (Conv1D) (None, 4981, 256) 20736
_________________________________________________________________
activation_1 (Activation) (None, 4981, 256) 0
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 1245, 256) 0
_________________________________________________________________
conv1d_2 (Conv1D) (None, 1226, 128) 655488
_________________________________________________________________
activation_2 (Activation) (None, 1226, 128) 0
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 306, 128) 0
_________________________________________________________________
conv1d_3 (Conv1D) (None, 287, 1) 2561
_________________________________________________________________
flatten_1 (Flatten) (None, 287) 0
_________________________________________________________________
dense_1 (Dense) (None, 128) 36864
_________________________________________________________________
output (Dense) (None, 1) 129
=================================================================
Total params: 715,778
Trainable params: 715,778
Non-trainable params: 0
I then fit my model against half my test data, and predict the other half:
from numpy.random import rand
model = baseline_model()
# initialize a random index
idx = rand(sample_ct) > 0.5
X_train, X_test = X[idx], X[~ idx]
Y_train, Y_test = Y[idx], Y[~ idx]
model.fit( X_train,
{'output': Y_train},
epochs=3)
pred = model.predict( X_test )
However, all my predictions are 0, with a MSE of 0.2994 after 3 epochs. What am I doing wrong?

Categories