My fit function is breaking after running the first epoch [duplicate]

My fit function is breaking after running the first epoch [duplicate] - python

This question already has answers here:
Tensorflow : logits and labels must have the same first dimension
(7 answers)
Closed 10 hours ago.
I am new to ML so please go easy on me, I might be missing something simple but such is the case with programming in general.
I did a course on Freecodecamp.com for Machine Learning in Python and I'm now doing one of the examples involving CNNs, which is supposed to train the model to detect whether the incoming image contains either a cat or a dog.
I finally got my model working today with a 75% accuracy, but I wasn't sure if it was using the validation data correctly, because somewhere along the journey I chose to set my validation classes as classes=['.'] and that's when it was working (see below).
val_data_gen = validation_image_generator.flow_from_directory(
validation_dir,
target_size=(IMG_HEIGHT,IMG_WIDTH),
batch_size=batch_size,
classes=classes, #classes=['.'] worked before idk why..
class_mode="categorical")
Thereafter, I noticed that and fixed it so that my validation data has the correct classes, but now my fit function will run for exactly one epoch everytime and throw this exception below (summarized):
50 try:
51 ctx.ensure_initialized()
---> 52 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
53 inputs, attrs, num_outputs)
54 except core._NotOkStatusException as e:
InvalidArgumentError: Graph execution error:
classes is defined as classes = ["cats", "dogs"] and my fit function is below for reference:
history = model.fit(
train_data_gen,
validation_data=(val_data_gen, classes),
epochs=epochs,
batch_size=batch_size,
validation_steps=len(val_data_gen)
)
Here is the Google Collab link if you would like to see a little more detail into it.
I have tried passing classes to the validation_data parameter (as the validation labels)
history = model.fit(
train_data_gen,
validation_data=(val_data_gen, classes),
epochs=epochs,
batch_size=batch_size,
validation_steps=len(val_data_gen)
)
I have tried to see if it can work without passing the classes, such as validation_data=val_data_gen.
I have tried changing the last dense layer value to 1: model.add(layers.Dense(1)), but I know that's wrong because I have 2 categories/classes, and I believe I got the same result in the end.
I have also tried adding/removing the batch_size, and validation_steps parameters, according to other StackOverflow questions but the only other one I found on this website, was a person that was passing the wrong value in the Dense parameters which does not seem to be my problem.
This is my model structure:
> model = Sequential()
> model.add(keras.Input(shape=(IMG_HEIGHT,IMG_WIDTH, 3)))
> model.add(layers.Conv2D(32, (3,3), activation='relu'))
> model.add(layers.MaxPooling2D(2,2))
> model.add(layers.Conv2D(64, (3,3),activation='relu'))
> model.add(layers.MaxPooling2D(2,2))
> model.add(layers.Conv2D(64, (3,3), activation='relu'))
> model.add(layers.MaxPooling2D(2,2))
> model.add(layers.Flatten())
> model.add(layers.Dense(64, activation='relu'))
> model.add(layers.Dense(2))
and this is my compile method:
> model.compile(optimizer='adam',
> loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
> metrics=['accuracy'])
Thank you all for your time and patience, and let me know if you want me to try anything.

After Pavel's suggestions, my code started crashing before starting the first epoch, but I got a more detailed error message: logits and labels must have the same first dimension. To solve my problem, I looked this error up and found a solution on SO. I had to change my function from SpareCategoricalCrossentropy to CategoricalCrossentropy and it works now. I run my fit function again and I'm now getting an 85% accuracy.
Hope this helps someone.

Related

Explosion in loss function, LSTM autoencoder

I am training a LSTM autoencoder, but the loss function randomly shoots up as in the picture below:
I tried multiple to things to prevent this, adjusting the batch size, adjusting the number of neurons in my layers, but nothing seems to help. I checked my input data to see if it contains null / infinity values, but it doesn't, it is normalized also. Here is my code for reference:
model = Sequential()
model.add(Masking(mask_value=0, input_shape=(430, 3)))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2, activation='relu'))
model.add(RepeatVector(430))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(3)))
model.compile(optimizer='adam', loss='mean_squared_error', metrics=['accuracy'])
context_paths = loadFile()
X_train, X_test = train_test_split(context_paths, test_size=0.20)
history = model.fit(X_train, X_train, epochs=1, batch_size=4, verbose=1, validation_data=(X_test, X_test))
The loss function explodes at random points in time, sometimes sooner, sometimes later. I read this thread about possible problems, but at this point after trying multiple things I am not sure what to do to prevent the loss function from skyrocketing at random. Any advice is appreciated. Other than this I can see that my accuracy is not increasing very much, so the problems may be interconnected.

Two main points:
1st point As highlighted by Daniel Möller:
Don't use 'relu' for LSTM, leave the standard activation which is 'tanh'.
2nd point: One way to fix the exploding gradient is to use clipnorm or clipvalue for the optimizer
Try something like this for the last two lines
For clipnorm:
opt = tf.keras.optimizers.Adam(clipnorm=1.0)
For clipvalue:
opt = tf.keras.optimizers.Adam(clipvalue=0.5)
See this post for help (previous version of TF):
How to apply gradient clipping in TensorFlow?
And this post for general explanation:
https://machinelearningmastery.com/how-to-avoid-exploding-gradients-in-neural-networks-with-gradient-clipping/

Two main issues:
Don't use 'relu' for LSTM, leave the standard activation which is 'tanh'. Because LSTM's are "recurrent", it's very easy for them to accumulate growing or decreasing of values to a point of making the numbers useless.
Check the range of your data X_train and X_test. Make sure they're not too big. Something between -4 and +4 is sort of good. You should consider normalizing your data if it's not normalized yet.
Notice that "accuracy" doesn't make any sense for problems that are not classificatino. (I notice your final activation is "linear", so you're not doing classification, right?)
Finally, if the two hints above don't work. Check whether you have an example that is all zeros, this might be creating a "full mask" sequence, and this "might" (I don't know) cause a bug.
(X_train == 0).all(axis=[1,2]).any() #should be false

Extracting Classification Module From the CapsNet

With Reference to the concept of Capsule Network, I am trying to extract just the classification module from the Intel's Implementation of Capsules in Keras, As I don't need the decoder or reconstruction part that is provided in the mentioned link.
My Try:
So I just commented out the decoder part of the network
#dec = Dense(512, activation='relu')(mask)
#dec = Dense(1024, activation='relu')(dec)
#dec = Dense(784, activation='sigmoid')(dec)
#dec = Reshape(input_shape)(dec)
and the decoder argument in the following line
#model = Model([x, mask_input], [output_capsule, dec])
model = Model([x, mask_input], [output_capsule])
model.compile(optimizer='adam', loss=[ margin_loss, 'mae' ], metrics=[ margin_loss, 'mae', 'accuracy'])
model.fit([X, Y], [Y, X], batch_size=128, epochs=3, validation_split=0.2)
Error
I am getting the following error.
ValueError: When passing a list as loss, it should have one entry per model outputs. The model has 1 outputs, but you passed loss=[<function margin_loss at 0x0000020C3E7A30D0>, 'mae']
Help Required:
Can somebody guide me how can I use that only the classification part of the Module because I have images that have dimension 90 x 90 and I want to use the classification part to check the accuracy and later on I would analyze each capsules?

To answer why you are getting that error:
The original model has a two outputs. Each output has its own loss function. In this case, the outputs are [output_capsule, dec] and the corresponding loss functions are [margin_loss, 'mae']. Because you removed the dec output, you need to remove its loss function from the compilation of the model.
In addition, you will need to make sure that you are passing the correct inputs and outputs everywhere else. In the code that you have here, that matters for model.fit. The first argument is the inputs. You still have two inputs to your model (not sure if you want two inputs, but that's a different problem), so passing in a list of two inputs to fit is good. However, the second argument is the desired outputs. You're currently passing in two output arrays, but you only have one output for your model. Since you removed your model's second output, you should remove the second output that you pass in to fit, i.e. X.
Modified code:
model = Model([x, mask_input], [output_capsule])
model.compile(optimizer='adam', loss=[margin_loss], metrics=[margin_loss, 'mae', 'accuracy'])
model.fit([X, Y], [Y], batch_size=128, epochs=3, validation_split=0.2)

My Neuronal Network isn't learning (negative R_Squared, always same loss, categorial input data, regression)

I try to get my Neuronal Network to work but unfortunately it looks like I am missing something.
I have input data from different categories.
For example the type of a machine. ('abc', 'bcd', 'dca').
So one line of my input contains different words from different distinct word-categories. At the moment I have ~70.000 samples with 12 features.
First I use sklearns labelEncoder to transform every word into a number.
The vocabulary size goes up to 17903.
My simple newtwork looks like this:
#Start with the NN
model = tf.keras.Sequential([
tf.keras.layers.Embedding(np.amax(ml_input)+1, 300, input_length = x_train.shape[1]),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(500, activation=tf.keras.activations.softmax),
tf.keras.layers.Dense(1, activation = tf.keras.activations.linear)
])
model.compile(optimizer=tf.keras.optimizers.Adam(lr=0.01),
loss=tf.keras.losses.mean_absolute_error,
metrics=[R_squared])
model.summary()
#Train the Model
callback = [tf.keras.callbacks.EarlyStopping(monitor='loss', min_delta=5.0, patience=15),
tf.keras.callbacks.ReduceLROnPlateau(monitor='loss', factor=0.1, patience=5, min_delta=5.00, min_lr=0)
]
history = model.fit(x_train, y_train, epochs=50, batch_size=64, verbose =2, callbacks = callback)
The loss of the first epoch is about 120 and after two epochs 70 but now it doesn't change anymore. So after two epochs my net isn't learning anymore.
I already tried other loss functions, standarize my labels (they go from 3 to 500mins), more neurons, another dense layer, another activation function. But after two epochs alway loss of 70. My R_Squared is something like -0.02 it changes but alway stays negative near 0.
It seems like my network isn't learning at all.
Does anyone have an Idea of what I am doing wrong?
Thanks for your help!

How can I intentionally overfit a convolutional neural net in Keras to make sure the model is working?

I'm trying to diagnose what's causing low accuracies when training my model. At this point, I just want to be able to get to high training accuracies (I can worry about testing accuracy/overfitting problems later). How can I adjust the model to overindex on training accuracy? I want to do this to make sure I didn't make any mistakes in a preprocessing step (shuffling, splitting, normalizing, etc.).
#PARAMS
dropout_prob = 0.2
activation_function = 'relu'
loss_function = 'categorical_crossentropy'
verbose_level = 1
convolutional_batches = 32
convolutional_epochs = 5
inp_shape = X_train.shape[1:]
num_classes = 3
def train_convolutional_neural():
y_train_cat = np_utils.to_categorical(y_train, 3)
y_test_cat = np_utils.to_categorical(y_test, 3)
model = Sequential()
model.add(Conv2D(filters=16, kernel_size=(3, 3), input_shape=inp_shape))
model.add(Conv2D(filters=32, kernel_size=(3, 3)))
model.add(MaxPooling2D(pool_size = (2,2)))
model.add(Dropout(rate=dropout_prob))
model.add(Flatten())
model.add(Dense(64,activation=activation_function))
model.add(Dense(num_classes,activation='softmax'))
model.summary()
model.compile(loss=loss_function, optimizer="adam", metrics=['accuracy'])
history = model.fit(X_train, y_train_cat, batch_size=convolutional_batches, epochs = convolutional_epochs, verbose = verbose_level, validation_data=(X_test, y_test_cat))
model.save('./models/convolutional_model.h5')

You need to remove the Dropout layer. Here is a small checklist for intentional overfitting:
Remove any regularizations (Dropout, L1 and L2 regularization)
Make sure to set slower learning rate (Adam is adaptive, so in your case it is fine)
You may want to not shuffle the training samples (e.g. all the first 100 samples are class A, the next 100 are class B, the last 100 are class C). Update: as pointed out by petezurich in the answer below, this should be considered with care as it could lead to no training effect at all.
Now, if you model overfit easily, then it is a good sign of a strong model, capable of representing the data. Otherwise, you may consider a deeper/wider model, or you should take a good look at the data and ask the question: "Are there really any pattenrs? Is this trainable?".

In addition to the other valid answers – one very simple way to overfit is to use only a small subset of your data. E.g. only 1 or 2 samples.
See also this extremely helpful post regarding everything that you can check to make sure your model is working: https://blog.slavv.com/37-reasons-why-your-neural-network-is-not-working-4020854bd607

What does it mean when the test scores are more than 100?

The MLP I created when tested on test sets shows a test score more than 100 multiple times. Could there be any mistake in coding or the data entered?
My code:
model = Sequential()
model.add(Dense(3, input_dim = 6))
model.add(Dense(3, activation='tanh'))
model.add(Dense(1))
opt = optimizers.Adam(learning_rate=0.01)
model.compile(optimizer=opt , loss='mean_squared_error')
model.fit(x, y, epochs=ep, batch_size = 50 ,verbose=0)
test_score = model.evaluate(test_x, test_y, verbose = 0)
test_score = sqrt(test_score)
test_score = get_unscaled (test_sf, np.array([test_score]))

model.evaluate can return two types of values:
Scalar: If you have not explicitly passed a value to the metric attribute to model.compile, it returns only the loss (which if it is mean squared error can be any non-negative real).
List of scalars: If you have passed metrics to model.compile, model.evaluate returns a list of scalars, the first element is the loss and all others are values of the metrics you passed.
To solve your question simply, pass your desired metric (say accuracy) to model.compile like model.compile(optimizer=opt, loss='mean_squared_error', metrics=['accuracy']). Running model.evaluate will then return [loss, accuracy]. Refer this.
You need to understand what you're doing before you start coding. It seems you are unclear on the meaning of mean squared error. Please do some reading up, both the theory and the Keras documentation, first.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

My fit function is breaking after running the first epoch [duplicate] - python

Related

Explosion in loss function, LSTM autoencoder

Extracting Classification Module From the CapsNet

My Neuronal Network isn't learning (negative R_Squared, always same loss, categorial input data, regression)

How can I intentionally overfit a convolutional neural net in Keras to make sure the model is working?

What does it mean when the test scores are more than 100?

Categories

Resources