I have a desired loss function as:
one_weight = (1-num_of_ones)/(num_of_ones + num_of_zeros)
zero_weight = (1-num_of_zeros)/(num_of_ones + num_of_zeros)
def weighted_binary_crossentropy(zero_weight, one_weight):
def weighted_binary_crossentropy(y_true, y_pred):
b_ce = K.binary_crossentropy(y_true, y_pred)
# weighted calc
weight_vector = y_true * one_weight + (1 - y_true) * zero_weight
weighted_b_ce = weight_vector * b_ce
return K.mean(weighted_b_ce)
return weighted_binary_crossentropy
I'm trying to use this loss function in my model which is:
model = Sequential()
model.add(BatchNormalization())
model.add(Conv2D(16, kernel_size=(32,1),strides=(1,1), activation='relu', input_shape=(78,64,1)))
model.add(Conv2D(16, kernel_size=(1,10),strides=(1,10), activation='relu'))
model.add(BatchNormalization())
model.add(ReLU(max_value=None))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(2, activation='sigmoid'))
model.compile(optimizer=opt, loss = weighted_binary_crossentropy , metrics = ['acc'] )
history = model.fit(Train_Data, Train_labels, batch_size =20, epochs = 450, shuffle = True , validation_data = (Val_Data, Val_labels))
my question is, the loss function requires an input which is y_pred (the labels of test data which are predicted by model). y_pred is accessible after training the model by my desired loss function, but the loss function requires y_pred during training the model.
On the other hand, I can say: I use the loss function to train my model but it gives error, because there is no y_pred to use it as input of loss function.
How can i use my desired loss function to train the model while I don't have y_pred before starting the training process? note that I have other required loss function parameters.
Pass your own parameters to weighted_binary_crossentropy. This function returns internal wrapped function (weighted_binary_crossentropy) which accepts y_true and y_pred and you don't need to do anything with it.
model.compile(optimizer=opt,
loss=weighted_binary_crossentropy(zero_weight,one_weight),
metrics=['acc'])
Related
def create_model():
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape=(40002, 12)))
model.add(LSTM(50, return_sequences= True))
model.add(LSTM(50, return_sequences= True))
model.add(tf.keras.layers.LSTM(30))
model.add(Dense(2, activation='linear'))
def rmse(Y_test, prediction):
return K.sqrt(K.mean(K.square(Y_test-prediction)))
# compile
model.compile(optimizer='adam', loss=rmse, metrics=['mean_squared_error', rmse])
return model
# fit the model
model = create_model()
model.fit(x_train, Y_train, shuffle=False, verbose=1, epochs=10)
# # predict model
prediction = model.predict(x_test, verbose=0)
print(prediction)
How to calculate mean relative error for tensor inputs i.e my Y_test and prediction are tensor.
Y_test and prediction as 2 values
Example:
Y_test = [[0.2,0.003],
[0.3, 0.008]]
prediction = [[0.4,0.005],
[0.5,0.007]]
mean_relative_error = mean(absolute(0.2-0.4)/0.2 + absolute(0.003-0.005)/0.003), mean(absolute(0.3-0.5)/0.3 + absolute(0.008-0.007)/0.008)
mean_relative_error = [0.533, 0.3925]
Please note that I don't want to use it for backpropagation to improve the network.
Would have added like this:
from tensorflow.math import reduce_mean, abs, reduce_sum
relative_error = reduce_mean(reduce_sum(abs(prediction-Y_test)/prediction, axis=1))
# [0.9, 0.54285717]
mean_relative_error = reduce_mean(relative_error)
# 0.7214286
I couldn't use tf.keras.losses.MeanAbsoluteError(reduction=tf.keras.losses.Reduction.NONE) because of a bug. The MeanAbsoluteError still does reduce to mean despite specifying it not to. The bug reported HERE
I am trying to use a custom loss function for my model. I am scaling y values previously and in my loss function I inverse scale them.(Using the answer from scaling back data in customized keras training loss function) After a random amount of epochs the loss starts to come as NaN also mean_absolute_error val_mean_absolute_error and val_loss are all NaN. Heres my model and custom loss function:
model = Sequential()
model.add(LSTM(units=512, activation="tanh", return_sequences=True, input_shape=(X_train.shape[1],X_train.shape[2])))
model.add(Dropout(0.2))
model.add(LSTM(units=256, activation="tanh", return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=128, activation="tanh", return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=64, activation="tanh"))
model.add(Dropout(0.2))
model.add(Dense(units = 2))
model.compile(optimizer = "Adam", loss = my_loss_function , metrics=['mean_absolute_error'])
model.summary()
I have 2 outputs as you can see.
def my_loss_function(y_actual, y_predicted):
y_actual = (y_actual - K.constant(y_scaler.min_)) / K.constant(y_scaler.scale_)
y_predicted = (y_predicted - K.constant(y_scaler.min_)) / K.constant(y_scaler.scale_)
a_loss = abs(y_actual[0]-y_predicted[0])*128000
b_loss = abs(y_actual[1]-y_predicted[1])*27000
loss= tf.math.sqrt(tf.square(a_loss) + tf.square(b_loss))
return loss
y_scaler is used earlier:
y_scaler = MinMaxScaler(feature_range = (0, 1))
y_scaler.fit(y_data)
y_data=y_scaler.transform(y_data)
y_testdata=y_scaler.transform(y_testdata)
Can anyone help?
When I use MSE, MAE etc. it works fine
For a regression task I'd like to customize the loss function to output a certainty measure additionally.
The initial normal network would be:
model = Sequential()
model.add(Dense(15, input_dim=7, kernel_initializer='normal'))
model.add(Dense(1, kernel_initializer='normal'))
model.compile(loss='mean_squared_error', optimizer='adam')
I'd like to add a certainty indicator sigma to the loss function. E.g. depending on how accurate the predictions are different sigma sizes lead to minimal loss.
loss = (y_pred-y_true)^2/(2*sigma^2) + log(sigma)
The final outputs of the NN would then be y_pred and sigma.
I'm a bit lost in the implementation (new to keras):
Where would we initialize/store sigma for it to be updated around recurring, similar datapoints during training.
How do we connect the variable sigma from the loss function to the second NN output.
My current base stucture, where I'm obviously lacking the pieces
def custom_loss(y_true, y_pred, sigma):
loss = pow((y_pred - y_true), 2)/(2 * pow(sigma, 2))+math.log(sigma)
return loss, sigma
model = Sequential()
model.add(Dense(15, input_dim=7, kernel_initializer='normal'))
model.add(Dense(2, kernel_initializer='normal'))
model.compile(loss=custom_loss, optimizer='adam')
Any tips/guidances are highly appreciated. Thanks!
The key is to extend y_pred from a scalar to a vector
def custom_loss(y_true, y_pred):
loss = pow((y_pred[0] - y_true), 2) / (2 * pow(y_pred[1], 2)) + \
tf.math.log(y_pred[1])
return loss
model = Sequential()
model.add(Dense(15, input_dim=7, kernel_initializer='normal'))
model.add(Dense(2, kernel_initializer='normal'))
model.compile(loss=custom_loss, optimizer='adam')
The model then returns the sigma to the prediction.
Y = model.predict(X) # Y = [prediction, sigma]
In the following code, I have XE, XW, YE, and YW of shapes (474077, 32), (474077, 32), (474077, 1), and (474077, 1), respectively.
After separately training modelE and modelW on 32 inputs and 1 output each, I add a Lambda layer that minimizes the difference between the outputs of both models. This code ran without errors.
I'm assuming this Lambda layer updates the weights and biases of modelE and modelW to minimize the difference between their outputs. How do I use the new updated weights and biases of modelE and modelW to predict their new outputs? I want to compare the initial outputs of the models and their outputs after the Lambda layer minimized the difference between them.
XtrainE, XtestE, YtrainE, YtestE = train_test_split(XE, YE, test_size=.5)
XtrainW, XtestW, YtrainW, YtestW = train_test_split(XW, YW, test_size=.5)
modelE = Sequential()
modelE.add(Dense(50, activation='relu', input_dim=32))
modelE.add(Dense(20, activation='relu'))
modelE.add(Dense(1, activation='relu'))
modelW = Sequential()
modelW.add(Dense(50, activation='relu', input_dim=32))
modelW.add(Dense(20, activation='relu'))
modelW.add(Dense(1, activation='relu'))
modelE.compile(loss='mse', optimizer='rmsprop')
modelW.compile(loss='mse', optimizer='rmsprop')
historyE= modelE.fit(XtrainE, YtrainE, validation_data=(XtestE,YtestE), epochs=200, batch_size=100, verbose=1)
historyW= modelW.fit(XtrainW, YtrainW, validation_data=(XtestW,YtestW), epochs=200, batch_size=100, verbose=1)
YpredE = modelE.predict(XtestE)
YpredW = modelW.predict(XtestW)
difference = Lambda(lambda x: x[0] - x[1])([modelE.output, modelW.output])
diffModel = Model(modelE.inputs + modelW.inputs, difference)
diffModel.compile(optimizer = 'adam', loss='mse')
diffModel.fit([XE,XW], np.zeros(YE.shape), epochs=200, batch_size=100, verbose=1)
I tried:
YpredWnew = modelW.predict(XtestW)
YpredEnew = modelE.predict(XtestE)
for i in range (len(YpredWnew)):
print("oldE= %.2f, newE= %.2f, oldW= %.2f, newW= %.2f," % (YpredE[i], YpredWnew[i], YpredW[i], YpredWnew[i]))
but this gives back the same value for all i in YpredEnew[i]
Thanks
I want to build a binary classifier using a Keras CNN.
I have about 6000 rows of input data which looks like this:
>> print(X_train[0])
[[[-1.06405307 -1.06685851 -1.05989663 -1.06273152]
[-1.06295958 -1.06655996 -1.05969803 -1.06382503]
[-1.06415248 -1.06735609 -1.05999593 -1.06302975]
[-1.06295958 -1.06755513 -1.05949944 -1.06362621]
[-1.06355603 -1.06636092 -1.05959873 -1.06173742]
[-1.0619655 -1.06655996 -1.06039312 -1.06412326]
[-1.06415248 -1.06725658 -1.05940014 -1.06322857]
[-1.06345662 -1.06377347 -1.05890365 -1.06034568]
[-1.06027557 -1.06019084 -1.05592469 -1.05537518]
[-1.05550398 -1.06038988 -1.05225064 -1.05676692]]]
>>> print(y_train[0])
[1]
And then I've build a CNN by this way:
model = Sequential()
model.add(Convolution1D(input_shape = (10, 4),
nb_filter=16,
filter_length=4,
border_mode='same'))
model.add(BatchNormalization())
model.add(LeakyReLU())
model.add(Dropout(0.2))
model.add(Convolution1D(nb_filter=8,
filter_length=4,
border_mode='same'))
model.add(BatchNormalization())
model.add(LeakyReLU())
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(64))
model.add(BatchNormalization())
model.add(LeakyReLU())
model.add(Dense(1))
model.add(Activation('softmax'))
reduce_lr = ReduceLROnPlateau(monitor='val_acc', factor=0.9, patience=30, min_lr=0.000001, verbose=0)
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
history = model.fit(X_train, y_train,
nb_epoch = 100,
batch_size = 128,
verbose=0,
validation_data=(X_test, y_test),
callbacks=[reduce_lr],
shuffle=True)
y_pred = model.predict(X_test)
But it returns the following:
>> print(confusion_matrix(y_test, y_pred))
[[ 0 362]
[ 0 608]]
Why all predictions are ones? Why does the CNN perform so bad?
Here are the loss and acc charts:
It always predicts one because of the output in your network. You have a Dense layer with one neuron, with a Softmax activation. Softmax normalizes by the sum of exponential of each output. Since there is one output, the only possible output is 1.0.
For a binary classifier you can either use a sigmoid activation with the "binary_crossentropy" loss, or put two output units at the last layer, keep using softmax and change the loss to categorical_crossentropy.