Keras custom loss becomes NaN after a while for regression - python

I am trying to use a custom loss function for my model. I am scaling y values previously and in my loss function I inverse scale them.(Using the answer from scaling back data in customized keras training loss function) After a random amount of epochs the loss starts to come as NaN also mean_absolute_error val_mean_absolute_error and val_loss are all NaN. Heres my model and custom loss function:
model = Sequential()
model.add(LSTM(units=512, activation="tanh", return_sequences=True, input_shape=(X_train.shape[1],X_train.shape[2])))
model.add(Dropout(0.2))
model.add(LSTM(units=256, activation="tanh", return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=128, activation="tanh", return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=64, activation="tanh"))
model.add(Dropout(0.2))
model.add(Dense(units = 2))
model.compile(optimizer = "Adam", loss = my_loss_function , metrics=['mean_absolute_error'])
model.summary()
I have 2 outputs as you can see.
def my_loss_function(y_actual, y_predicted):
y_actual = (y_actual - K.constant(y_scaler.min_)) / K.constant(y_scaler.scale_)
y_predicted = (y_predicted - K.constant(y_scaler.min_)) / K.constant(y_scaler.scale_)
a_loss = abs(y_actual[0]-y_predicted[0])*128000
b_loss = abs(y_actual[1]-y_predicted[1])*27000
loss= tf.math.sqrt(tf.square(a_loss) + tf.square(b_loss))
return loss
y_scaler is used earlier:
y_scaler = MinMaxScaler(feature_range = (0, 1))
y_scaler.fit(y_data)
y_data=y_scaler.transform(y_data)
y_testdata=y_scaler.transform(y_testdata)
Can anyone help?
When I use MSE, MAE etc. it works fine

Related

How to embed a manual loss function in keras model

I have a desired loss function as:
one_weight = (1-num_of_ones)/(num_of_ones + num_of_zeros)
zero_weight = (1-num_of_zeros)/(num_of_ones + num_of_zeros)
def weighted_binary_crossentropy(zero_weight, one_weight):
def weighted_binary_crossentropy(y_true, y_pred):
b_ce = K.binary_crossentropy(y_true, y_pred)
# weighted calc
weight_vector = y_true * one_weight + (1 - y_true) * zero_weight
weighted_b_ce = weight_vector * b_ce
return K.mean(weighted_b_ce)
return weighted_binary_crossentropy
I'm trying to use this loss function in my model which is:
model = Sequential()
model.add(BatchNormalization())
model.add(Conv2D(16, kernel_size=(32,1),strides=(1,1), activation='relu', input_shape=(78,64,1)))
model.add(Conv2D(16, kernel_size=(1,10),strides=(1,10), activation='relu'))
model.add(BatchNormalization())
model.add(ReLU(max_value=None))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(2, activation='sigmoid'))
model.compile(optimizer=opt, loss = weighted_binary_crossentropy , metrics = ['acc'] )
history = model.fit(Train_Data, Train_labels, batch_size =20, epochs = 450, shuffle = True , validation_data = (Val_Data, Val_labels))
my question is, the loss function requires an input which is y_pred (the labels of test data which are predicted by model). y_pred is accessible after training the model by my desired loss function, but the loss function requires y_pred during training the model.
On the other hand, I can say: I use the loss function to train my model but it gives error, because there is no y_pred to use it as input of loss function.
How can i use my desired loss function to train the model while I don't have y_pred before starting the training process? note that I have other required loss function parameters.
Pass your own parameters to weighted_binary_crossentropy. This function returns internal wrapped function (weighted_binary_crossentropy) which accepts y_true and y_pred and you don't need to do anything with it.
model.compile(optimizer=opt,
loss=weighted_binary_crossentropy(zero_weight,one_weight),
metrics=['acc'])

Validation loss increase and constant training accuracy 1D cnn

I'm implementing a CNN for speech recognition. The input is MEL frequencies with shape (85314, 99, 1) and the labels are one-hot encoded with 35 output classes (shape: (85314, 35)). When I run the model the training accuracy (image 2) starts high and stays the same over the number of epochs, while the loss on validation (image 1) increases. Hence, it is probably overfitting but I cannot find the origin of the issue. I already decreased the learning rate and played with batch sizes but the results stays the same. Also the amount of training data should be sufficient. Is there another issue with my hyper-parameter settings somewhere?
My model and hyper-parameters are defined as follows:
#hyperparameters
input_dimension = 85314
learning_rate = 0.0000025
momentum = 0.85
hidden_initializer = random_uniform(seed=1)
dropout_rate = 0.2
# create model
model = Sequential()
model.add(Convolution1D(nb_filter=32, filter_length=3, input_shape=(99, 1), activation='relu'))
model.add(Convolution1D(nb_filter=16, filter_length=1, activation='relu'))
model.add(Flatten())
model.add(Dropout(0.2))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(64, activation='relu'))
model.add(Dense(35, activation='softmax'))
model.compile(loss='binary_crossentropy', optimizer='sgd', metrics=['acc'])
history = model.fit(frequencies_train, labels_hot, validation_split=0.2, epochs=10, batch_size=50)
You are using "binary_crossentropy" for a problem of multiple classes. Change it to "categorical_crossentrop".
The accuracy computed with Keras using the binary_crossentropy with a model of more than 2 labels is just wrong.

Getting new outputs after adding a Keras Lambda layer?

In the following code, I have XE, XW, YE, and YW of shapes (474077, 32), (474077, 32), (474077, 1), and (474077, 1), respectively.
After separately training modelE and modelW on 32 inputs and 1 output each, I add a Lambda layer that minimizes the difference between the outputs of both models. This code ran without errors.
I'm assuming this Lambda layer updates the weights and biases of modelE and modelW to minimize the difference between their outputs. How do I use the new updated weights and biases of modelE and modelW to predict their new outputs? I want to compare the initial outputs of the models and their outputs after the Lambda layer minimized the difference between them.
XtrainE, XtestE, YtrainE, YtestE = train_test_split(XE, YE, test_size=.5)
XtrainW, XtestW, YtrainW, YtestW = train_test_split(XW, YW, test_size=.5)
modelE = Sequential()
modelE.add(Dense(50, activation='relu', input_dim=32))
modelE.add(Dense(20, activation='relu'))
modelE.add(Dense(1, activation='relu'))
modelW = Sequential()
modelW.add(Dense(50, activation='relu', input_dim=32))
modelW.add(Dense(20, activation='relu'))
modelW.add(Dense(1, activation='relu'))
modelE.compile(loss='mse', optimizer='rmsprop')
modelW.compile(loss='mse', optimizer='rmsprop')
historyE= modelE.fit(XtrainE, YtrainE, validation_data=(XtestE,YtestE), epochs=200, batch_size=100, verbose=1)
historyW= modelW.fit(XtrainW, YtrainW, validation_data=(XtestW,YtestW), epochs=200, batch_size=100, verbose=1)
YpredE = modelE.predict(XtestE)
YpredW = modelW.predict(XtestW)
difference = Lambda(lambda x: x[0] - x[1])([modelE.output, modelW.output])
diffModel = Model(modelE.inputs + modelW.inputs, difference)
diffModel.compile(optimizer = 'adam', loss='mse')
diffModel.fit([XE,XW], np.zeros(YE.shape), epochs=200, batch_size=100, verbose=1)
I tried:
YpredWnew = modelW.predict(XtestW)
YpredEnew = modelE.predict(XtestE)
for i in range (len(YpredWnew)):
print("oldE= %.2f, newE= %.2f, oldW= %.2f, newW= %.2f," % (YpredE[i], YpredWnew[i], YpredW[i], YpredWnew[i]))
but this gives back the same value for all i in YpredEnew[i]
Thanks

Convolutional neural net testing accuracy stays constant after each epoch

I see improving training accuracies after each iteration, but testing accuracy stays fixed at exactly 0.7545 after each epoch. I understand hitting a ceiling on accuracy at some point but don't understand why I don't at least see slight variations in accuracies (up or down). I'm training on about 800 images total.
Things I've tried:
- Switch to SGD optimizer.
- Start with learning rate of 0.01 and reduce until 0.00000001.
- Remove regularization layers.
#PARAMS
dropout_prob = 0.2
activation_function = 'relu'
loss_function = 'categorical_crossentropy'
verbose_level = 1
convolutional_batches = 32
convolutional_epochs = 10
inp_shape = X_train.shape[1:]
num_classes = 3
opt = SGD(lr=0.00001)
opt2 = 'adam'
def train_convolutional_neural():
y_train_cat = np_utils.to_categorical(y_train, 3)
y_test_cat = np_utils.to_categorical(y_test, 3)
model = Sequential()
model.add(Conv2D(filters=16, kernel_size=(3, 3), input_shape=inp_shape))
model.add(Conv2D(filters=32, kernel_size=(3, 3)))
model.add(MaxPooling2D(pool_size = (2,2)))
model.add(Dropout(rate=dropout_prob))
model.add(Flatten())
#model.add(Dense(64,activation=activation_function))
model.add(Dropout(rate=dropout_prob))
model.add(Dense(32,activation=activation_function))
model.add(Dense(num_classes,activation='softmax'))
model.summary()
model.compile(loss=loss_function, optimizer=opt, metrics=['accuracy'])
history = model.fit(X_train, y_train_cat, batch_size=convolutional_batches, epochs = convolutional_epochs, verbose = verbose_level, validation_data=(X_test, y_test_cat))
model.save('./models/convolutional_model.h5')

Why does a binary Keras CNN always predict 1?

I want to build a binary classifier using a Keras CNN.
I have about 6000 rows of input data which looks like this:
>> print(X_train[0])
[[[-1.06405307 -1.06685851 -1.05989663 -1.06273152]
[-1.06295958 -1.06655996 -1.05969803 -1.06382503]
[-1.06415248 -1.06735609 -1.05999593 -1.06302975]
[-1.06295958 -1.06755513 -1.05949944 -1.06362621]
[-1.06355603 -1.06636092 -1.05959873 -1.06173742]
[-1.0619655 -1.06655996 -1.06039312 -1.06412326]
[-1.06415248 -1.06725658 -1.05940014 -1.06322857]
[-1.06345662 -1.06377347 -1.05890365 -1.06034568]
[-1.06027557 -1.06019084 -1.05592469 -1.05537518]
[-1.05550398 -1.06038988 -1.05225064 -1.05676692]]]
>>> print(y_train[0])
[1]
And then I've build a CNN by this way:
model = Sequential()
model.add(Convolution1D(input_shape = (10, 4),
nb_filter=16,
filter_length=4,
border_mode='same'))
model.add(BatchNormalization())
model.add(LeakyReLU())
model.add(Dropout(0.2))
model.add(Convolution1D(nb_filter=8,
filter_length=4,
border_mode='same'))
model.add(BatchNormalization())
model.add(LeakyReLU())
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(64))
model.add(BatchNormalization())
model.add(LeakyReLU())
model.add(Dense(1))
model.add(Activation('softmax'))
reduce_lr = ReduceLROnPlateau(monitor='val_acc', factor=0.9, patience=30, min_lr=0.000001, verbose=0)
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
history = model.fit(X_train, y_train,
nb_epoch = 100,
batch_size = 128,
verbose=0,
validation_data=(X_test, y_test),
callbacks=[reduce_lr],
shuffle=True)
y_pred = model.predict(X_test)
But it returns the following:
>> print(confusion_matrix(y_test, y_pred))
[[ 0 362]
[ 0 608]]
Why all predictions are ones? Why does the CNN perform so bad?
Here are the loss and acc charts:
It always predicts one because of the output in your network. You have a Dense layer with one neuron, with a Softmax activation. Softmax normalizes by the sum of exponential of each output. Since there is one output, the only possible output is 1.0.
For a binary classifier you can either use a sigmoid activation with the "binary_crossentropy" loss, or put two output units at the last layer, keep using softmax and change the loss to categorical_crossentropy.

Categories