How do keras LSTM input and output shapes work?

How do keras LSTM input and output shapes work? - python

trainX, trainY, sequence_length=len(train), batch_size=batchTrain
)
val=timeseries_dataset_from_array(
valX, valY, sequence_length=len(val), batch_size=batchVal
)
test=timeseries_dataset_from_array(
testX, testY, sequence_length=len(test), batch_size=batchTest
)
return train, val, test
train, val, test = preprocessor()
model=Sequential()
model.add(LSTM(4,return_sequences=True))
model.add(Dense(2,activation='softmax'))
model.compile(optimizer='Adam', loss="mae")
model.fit(train, epochs=200, verbose=2, validation_data=val, shuffle=False)
I'm trying to make an LSTM from time-series data and when I run the above, the loss doesn't change at all. I'm definitely struggling to understand how lstm input/output shapes work. I've read as much online as I could find, but I can't seem to get the model to learn. I'm under the impression that the first argument is the dimensionality of the output space. I want the lstm to return the whole sequence to the output function.

There are many problems in your model. You final layer is dense with two units and you are using softmax which should be replaced by sigmoid. Since you are using softmax, i guess that you are using this model for classification and not regression.
If you are using a model for classification tasks then you should use BinaryCrossentropy and not MeanAbsoluteError as loss.
To answer the question in full detail, you need to post the additional information. For example: What are you target variables etc.

Related

Tensorflow model pruning gives 'nan' for training and validation losses

I'm trying to prune a base model that consists of several layers on top of a VGG network. It also contains a user-defined layer named instance_normalization. For pruning to be successful, I've defined the get_prunable_weights function of this layer as follows:
### defined for model pruning
def get_prunable_weights(self):
return self.weights
I used the following function to obtain a to-be-pruned model structure using a base model named model:
def define_prune_model(self, model, img_shape, epochs, batch_size, validation_split=0.1):
num_images = img_shape[0] * (1 - validation_split)
end_step = np.ceil(num_images / batch_size).astype(np.int32) * epochs
# Define model for pruning.
pruning_params = {
'pruning_schedule': tfmot.sparsity.keras.PolynomialDecay(initial_sparsity=0.5,
final_sparsity=0.80,
begin_step=0,
end_step=end_step)
}
model_for_pruning = prune_low_magnitude(model, **pruning_params)
model_for_pruning.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model_for_pruning.summary()
return model_for_pruning
Then, I wrote the following function to perform training on this pruning model:
def train_prune_model(self, model_for_pruning, train_images, train_labels,
epochs, batch_size, validation_split=0.1):
callbacks = [
tfmot.sparsity.keras.UpdatePruningStep(),
tfmot.sparsity.keras.PruningSummaries(log_dir='./models/pruned'),
]
model_for_pruning.fit(train_images, train_labels,
batch_size=batch_size, epochs=epochs, validation_split=validation_split,
callbacks=callbacks)
return model_for_pruning
However, when training, I found out that the training and validation losses were all nan, and the final model prediction output was totally zero. However, the base model that passed to define_prune_model has successfully trained and predicted correctly.
How can I solve this? Thank you in advance.

It is difficult to pinpoint the issue without more informations. In particular, can you please give more detail (preferably as code) about your custom instance_normalization layer ?
Assuming that the code is fine: Since you mentioned that the model trains correctly without pruning, could it be that those pruning parameters are too harsh ? After all, those options set 50% of the weights to zero right from the first learning step.
Here is what I would try:
Experiment with a lower level of sparsity (especially initial_sparsity).
Start to apply pruning later during the training (begin_step argument of the pruning schedule). Some even prefer to train the model once without applying pruning at all. Then re-train again with prune_low_magnitude().
Only prune at some steps, giving time for the model to recover between prunings (frequency argument).
Finally should it still fail, the usual cures when encountering nan losses: reduce the learning rate, use regularization or gradient clipping, ...

Keras RNN accuracy doesn't improve

I'm trying to improve my model so it can become a bit more accurate. Right now I'm training the model and get this as my training and validation accuracy.
For every epoch I get an training accuracy of 0.0003 and an validation accuracy of 0. I know this isn't good but I don't know how I can fix this.
Data is normalized with the minmax scaler. 4 of the 8 features are normalized (other 4 are hour, day, day_of_week and month)
Update:
I've also tried to normalize the entire dataset and it doesn't make a differance
scaling = MinMaxScaler(feature_range=(0,1)).fit(df[cols])
df[[cols]] = scaling.transform(df[[cols]])
My model: The shape is (5351, 1, 8)
and the input_shape is (1, 8)
model = keras.Sequential()
model.add(keras.layers.Bidirectional(keras.layers.LSTM(2,input_shape=(X_train.shape[1], X_train.shape[2]), return_sequences=True, activation='linear')))
model.add(keras.layers.Dense(1))
model.compile(loss='mean_squared_error', optimizer='Adamax', metrics=['acc'])
history = model.fit(
X_train, y_train,
epochs=200,
batch_size=24,
validation_split=0.35,
shuffle=False,
)
i tried using the answer of this question:
Keras model accuracy not improving
but it didn't work

A mean_sqared_error loss is for regression tasks while a acc metric is for classification problems. So it makes no sense to use them together.
If you work on a classification problem, use binary_crossentropy or categorical_crossentropy as loss and keep the metric parameter as you did.
If it is a regression tasks, change the metric to [mse] for mean squares error instead of [acc].
Your model "works" and you have applied the standard formula for backpropagation by using the mean squares error loss. But measuring the accuracy will make Keras check if your model's output is EXACTLY equals to the expected values. Since the loss function is for regression, it will hardly ever be equal.
Three last points because that little change won't correct everything.
Firstly, your last dense layer should have an activation function. (It's safier)
Secondly, I'm pretty sure a Bidirectional+LSTM layer placed before a Dense layer should have a return_sequences=False. A LSTM layer (with or without Bidirectional) can return thé full séquence of vector (like a matrix) but a dense layer takes vectors as input. But in this case it will work because of the third point.
The last point is about the shape of your data. You have 5351 examples of shape (1, 8) each which a vector of size 8. But a LSTM layer takes a sequence of vectors still thé size of your séquence is one. I don't know if it is relevent to use an RNN type layer here.

Determine number of Nodes and Layers based on shape of the data

Is there a way to determine number of nodes and hidden layers based on shape of the data?
Also, is there a way to determine the best activation function based on the topic?
For example, Im making model for fake news prediction. My features are number of words in text, number of words in title, number of questions, number of capital letters etc.
My dataset has 22 features and around 35000 rows. My output should be 0 or 1.
Based on that, how many layers and nodes should I use and what activation functions are the best for this?
This is my net:
model = Sequential()
model.add(Dense(100, input_dim = features.shape[1], activation = 'relu')) # input layer requires input_dim param
model.add(Dense(100, activation = 'relu'))
model.add(Dense(100, activation = 'relu'))
model.add(Dropout(0.1))
model.add(Dense(1, activation='sigmoid')) # sigmoid instead of relu for final probability between 0 and 1
sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss="mean_squared_error", optimizer=sgd, metrics=['accuracy'])
# call the function to fit to the data training the network)
model.fit(x_train, y_train, epochs = 10, shuffle = True, batch_size=32, validation_data=(x_test, y_test), verbose=1)
scores = model.evaluate(features, results)
print(model.metrics_names[1], scores[1]*100)

Selecting those requires prior experience, otherwise we won't need that much ML Engineers trying different architectures and writing papers.
But for a start I would recommend you take a look at autokeras, It will help with your problem as it's kind of a known problem -Text Classification-, you only need to structure your data as input(X and Y) and then feed that to their Text Classifier which will try different models(You could specify that) to choose the best fitting for your case.
You could find more examples in the docs here
https://autokeras.com/tutorial/text_classification/
import autokeras as ak
# Initialize the text classifier.
clf = ak.TextClassifier(max_trials=10) # It tries 10 different models
# Feed the text classifier with training data.
clf.fit(x_train, y_train)
# Predict with the best model.
predicted_y = clf.predict(x_test)
# Evaluate the best model with testing data.
print(clf.evaluate(x_test, y_test))

Answer is no and no.
Well these are also hyperparameters. You can select a bunch of them and try all of them to get a rough idea of which is giving you the best result. Yes the same statement holds for activation function as well.
You can use more layers than you need and then use regularization to stop producing an overfitted model. Also if it is too less you can clearly understand the underfitting behavior from the loss curve giving high training error.
There is no formula for determining all these. You have to try different things based on the problem at hand and you will see some of it would work better than the others.
For output softmax layer would be good as this will give you a probability of predictions which you can easily convert to one-hot encoding.

Extracting Classification Module From the CapsNet

With Reference to the concept of Capsule Network, I am trying to extract just the classification module from the Intel's Implementation of Capsules in Keras, As I don't need the decoder or reconstruction part that is provided in the mentioned link.
My Try:
So I just commented out the decoder part of the network
#dec = Dense(512, activation='relu')(mask)
#dec = Dense(1024, activation='relu')(dec)
#dec = Dense(784, activation='sigmoid')(dec)
#dec = Reshape(input_shape)(dec)
and the decoder argument in the following line
#model = Model([x, mask_input], [output_capsule, dec])
model = Model([x, mask_input], [output_capsule])
model.compile(optimizer='adam', loss=[ margin_loss, 'mae' ], metrics=[ margin_loss, 'mae', 'accuracy'])
model.fit([X, Y], [Y, X], batch_size=128, epochs=3, validation_split=0.2)
Error
I am getting the following error.
ValueError: When passing a list as loss, it should have one entry per model outputs. The model has 1 outputs, but you passed loss=[<function margin_loss at 0x0000020C3E7A30D0>, 'mae']
Help Required:
Can somebody guide me how can I use that only the classification part of the Module because I have images that have dimension 90 x 90 and I want to use the classification part to check the accuracy and later on I would analyze each capsules?

To answer why you are getting that error:
The original model has a two outputs. Each output has its own loss function. In this case, the outputs are [output_capsule, dec] and the corresponding loss functions are [margin_loss, 'mae']. Because you removed the dec output, you need to remove its loss function from the compilation of the model.
In addition, you will need to make sure that you are passing the correct inputs and outputs everywhere else. In the code that you have here, that matters for model.fit. The first argument is the inputs. You still have two inputs to your model (not sure if you want two inputs, but that's a different problem), so passing in a list of two inputs to fit is good. However, the second argument is the desired outputs. You're currently passing in two output arrays, but you only have one output for your model. Since you removed your model's second output, you should remove the second output that you pass in to fit, i.e. X.
Modified code:
model = Model([x, mask_input], [output_capsule])
model.compile(optimizer='adam', loss=[margin_loss], metrics=[margin_loss, 'mae', 'accuracy'])
model.fit([X, Y], [Y], batch_size=128, epochs=3, validation_split=0.2)

How to train a model with only an Embedding layer in Keras and no labels

I have some text without any labels. Just a bunch of text files. And I want to train an Embedding layer to map the words to embedding vectors. Most of the examples I've seen so far are like this:
from keras.models import Sequential
from keras.layers import Embedding, Flatten, Dense
model = Sequential()
model.add(Embedding(max_words, embedding_dim, input_length=maxlen))
model.add(Flatten())
model.add(Dense(32, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.summary()
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['acc'])
model.fit(x_train, y_train,
epochs=10,
batch_size=32,
validation_data=(x_val, y_val))
They all assume that the Embedding layer is part of a bigger model which tries to predict a label. But in my case, I have no label. I'm not trying to classify anything. I just want to train the mapping from words (more precisely integers) to embedding vectors. But the fit method of the model, asks for x_train and y_train (as the example given above).
How can I train a model only with an Embedding layer and no labels?
[UPDATE]
Based on the answer I've got from #Daniel Möller, Embedding layer in Keras is implementing a supervised algorithm and thus cannot be trained without labels. Initially, I was thinking that it is a variation of Word2Vec and thus does not need labels to be trained. Apparently, that's not the case. Personally, I ended up using the FastText which has nothing to do with Keras or Python.

Does it make sense to do that without a label/target?
How will your model decide which values in the vectors are good for anything if there is no objective?
All embeddings are "trained" for a purpose. If there is no purpose, there is no target, if there is no target, there is no training.
If you really want to transform words in vectors without any purpose/target, you've got two options:
Make one-hot encoded vectors. You may use the Keras to_categorical function for that.
Use a pretrained embedding. There are some available, such as glove, embeddings from Google, etc. (All of they were trained at some point for some purpose).
A very naive approach based on our chat, considering word distance
Warning: I don't really know anything about Word2Vec, but I'll try to show how to add the rules for your embedding using some naive kind of word distance and how to use dummy "labels" just to satisfy Keras' way of training.
from keras.layers import Input, Embedding, Subtract, Lambda
import keras.backend as K
from keras.models import Model
input1 = Input((1,)) #word1
input2 = Input((1,)) #word2
embeddingLayer = Embedding(...params...)
word1 = embeddingLayer(input1)
word2 = embeddingLayer(input2)
#naive distance rule, subtract, expect zero difference
word_distance = Subtract()([word1,word2])
#reduce all dimensions to a single dimension
word_distance = Lambda(lambda x: K.mean(x, axis=-1))(word_distance)
model = Model([input1,input2], word_distance)
Now that our model outputs directly a word distance, our labels will be "zero", they're not really labels for a supervised training, but they're the expected result of the model, something necessary for Keras to work.
We can have as loss function the mae (mean absolute error) or mse (mean squared error), for instance.
model.compile(optimizer='adam', loss='mse')
And training with word2 being the word after word1:
xTrain = entireText
xTrain1 = entireText[:-1]
xTrain2 = entireText[1:]
yTrain = np.zeros((len(xTrain1),))
model.fit([xTrain1,xTrain2], yTrain, .... more params.... )
Although this may be completely wrong regarding what Word2Vec really does, it shows the main points that are:
Embedding layers don't have special properties, they're just trainable lookup tables
Rules for creating an embedding should be defined by the model and expected outputs
A Keras model will need "targets", even if those targets are not "labels" but a mathematical trick for an expected result.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do keras LSTM input and output shapes work? - python

Related

Tensorflow model pruning gives 'nan' for training and validation losses

Keras RNN accuracy doesn't improve

Determine number of Nodes and Layers based on shape of the data

Extracting Classification Module From the CapsNet

How to train a model with only an Embedding layer in Keras and no labels

Categories

Resources