How to use mean_squared_error loss in tensorflow session - python

I am new to tensorflow
In a part of a code for a tensorflow session, there is :
loss = tf.nn.softmax_cross_entropy_with_logits_v2(
logits=net, labels=self.out_placeholder, name='cross_entropy')
self.loss = tf.reduce_mean(loss, name='mean_squared_error')
I want to use mean_squared_error loss function for this purpose. I found this loss function in tensorflow website:
tf.losses.mean_squared_error(
labels,
predictions,
weights=1.0,
scope=None,
loss_collection=tf.GraphKeys.LOSSES,
reduction=Reduction.SUM_BY_NONZERO_WEIGHTS
)
I need this loss function for a regression problem.
I tried:
loss = tf.losses.mean_squared_error(predictions=net, labels=self.out_placeholder)
self.loss = tf.reduce_mean(loss, name='mean_squared_error')
Where net = tf.matmul(input_tensor, weights) + biases
However, I'm not sure if it's the correct way.

First of all keep in mind that cross-entropy is mainly used for classification, while MSE is used for regression.
In your case cross entropy measures the difference between two distributions (the real occurences, called labels - and your predictions)
So while the first loss functions works on the result of the softmax layer (which can be seen as a probability distribution), the second one works directly on the floating point output of your network (which is no probability distribution) - therefore they cannot be simply exchanged.

Related

Keras Categorical Cross Entropy

I'm trying to wrap my head around the categorical cross entropy loss. Looking at the implementation of the cross entropy loss in Keras:
# scale preds so that the class probas of each sample sum to 1
output = output / math_ops.reduce_sum(output, axis, True)
# Compute cross entropy from probabilities.
epsilon_ = _constant_to_tensor(epsilon(), output.dtype.base_dtype)
output = clip_ops.clip_by_value(output, epsilon_, 1. - epsilon_)
return -math_ops.reduce_sum(target * math_ops.log(output), axis)
I do not see where the delta = output - target
is calculated.
See here.
What am I missing?
I think you might be confusing two different concepts / events here.
The categorical cross entropy loss is a measure of the error of your model, as calculated by :
def categorical_crossentropy(target, output, from_logits=False, axis=-1):
<etc>
This just returns an array of losses for each label, it is the direct difference between the true label and what your model thinks the label should be.
The next step after calculating the loss (part of the forward propagation phase) is to then start backpropagation, i.e. we want to find the influence that each weight/bias matrix has on the loss you've calculated above, so that we can perform the update step.
The first step is then to calculate dL/dz i.e. the derivative of the loss function with respect to the linear function (y = Wx + b), which itself is the combination of dL/da * da/dz (i.e. the deriv loss wrt activation * deriv activation wrt the linear function).
The link you posted is the derivative of the activation function wrt the linear function. This blog does a decent job of explaining how all the parts fit together, although the activation function they use is a sigmoid, but the overall pieces that fit together are the same.

How to implement gradient ascent in a Keras DQN

Have built a Reinforcement Learning DQN with variable length sequences as inputs, and positive and negative rewards calculated for actions. Some problem with my DQN model in Keras means that although the model runs, average rewards over time decrease, over single and multiple cycles of epsilon. This does not change even after significant period of training.
My thinking is that this is due to using MeanSquareError in Keras as the Loss function (minimising error). So I am trying to implement gradient ascent (to maximise reward). How to do this in Keras? My current model is:
model = Sequential()
inp = (env.NUM_TIMEPERIODS, env.NUM_FEATURES)
model.add(Input(shape=inp)) # 'a shape tuple(integers), not including batch-size
model.add(Masking(mask_value=0., input_shape=inp))
model.add(LSTM(env.NUM_FEATURES, input_shape=inp, return_sequences=True))
model.add(LSTM(env.NUM_FEATURES))
model.add(Dense(env.NUM_FEATURES))
model.add(Dense(4))
model.compile(loss='mse,
optimizer=Adam(lr=LEARNING_RATE, decay=DECAY),
metrics=[tf.keras.losses.MeanSquaredError()])
In trying to implement gradient ascent, by 'flipping' the gradient (as negative or inverse loss?), I have tried various loss definitions:
loss=-'mse'
loss=-tf.keras.losses.MeanSquaredError()
loss=1/tf.keras.losses.MeanSquaredError()
but these all generate bad operand [for unary] errors.
How to adapt current Keras model to maximise rewards ?
Or is this gradient ascent not even the problem? Could it be some issue with the action policy?
Writing a custom loss function
Here is the loss function you want
#tf.function
def positive_mse(y_true, y_pred):
return -1 * tf.keras.losses.MSE(y_true, y_pred)
And then your compile line becomes
model.compile(loss=positive_mse,
optimizer=Adam(lr=LEARNING_RATE, decay=DECAY),
metrics=[tf.keras.losses.MeanSquaredError()])
Please note : use loss=positive_mse and not loss=positive_mse(). That's not a typo. This is because you need to pass the function, not the results of executing the function.

Use Hamming Distance Loss Function with Tensorflow GradientTape: no gradients. Is it not differentiable?

I'm using Tensorflow 2.1 and Python 3, creating my custom training model following the tutorial "Tensorflow - Custom training: walkthrough".
I'm trying to use Hamming Distance on my loss function:
import tensorflow as tf
import tensorflow_addons as tfa
def my_loss_hamming(model, x, y):
global output
output = model(x)
return tfa.metrics.hamming.hamming_loss_fn(y, output, threshold=0.5, mode='multilabel')
def grad(model, inputs, targets):
with tf.GradientTape() as tape:
tape.watch(model.trainable_variables)
loss_value = my_loss_hamming(model, inputs, targets)
return loss_value, tape.gradient(loss_value, model.trainable_variables)
When I call it:
loss_value, grads = grad(model, feature, label)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
grads variable is a list with 38 None.
And I get the error:
No gradients provided for any variable: ['conv1_1/kernel:0', ...]
Is there any way to use Hamming Distance without "interrupts the gradient chain registered by the gradient tape"?
Apology if I'm saying something obvious, but the way how backpropagation works as a fitting algorithm for neural networks is through gradients - e.g. for each batch of training data you compute how much the loss function will improve/degrade if you move a particular trainable weight by a very small amount delta.
Hamming loss is by definition not differentiable, so for small movements of trainable weights you will never experience any changes in the loss. I imagine it is only added to be used for final measurements of trained models' performance rather than for training.
If you want to train a neural net through backpropagation you need to use some differentiable loss - such that can help the model to move weights in the right direction. Sometimes people use different techniques to smooth such losses as Hamming less and create approximations - e.g. here it could be something which would penalize less predictions which are closer to the target answer rather then just giving out 1 for everything above threshold and 0 for everything else.

Custom loss function for predicting interget outputs?

I'm currenly working on a dataset where I've to predict an integer output. It starts from 1 to N. I've build a network with loss function mse. But I feel like mse loss function may not be an ideal loss function to minimize in the case of integer output.
I'm also round my prediction to get integer output. Is there a way to make/optimize the model better in case of integer output.
Can anyone provide some help on how to deal with integer output/targets. This is the loss function I'm using right now.
model.compile(optimizer=SGD(0.001), loss='mse')
You are using the wrong loss, mean squared error is a loss for regression, and you have a classification problem (discrete outputs, not continuous).
So for this your model should have a softmax output layer:
model.add(Dense(N, activation="softmax"))
And you should be using a classification loss:
model.compile(optimizer=SGD(0.001), loss='sparse_categorical_crossentropy')
Assuming your labels are integers in the [0, N-1] range (off by one), this should work. To make a prediction, you should do:
output = np.argmax(model.predict(some_data), axis=1) + 1
The +1 is because integer labels go from 0 to N-1
Ordinal regression could be an appropriate approach, in case predicting the wrong month but close to the true month is considered a smaller mistake than predicting a value one year earlier or later. Only you can know that, based on the specific problem you want to solve.
I found an implementation of the appropriate loss function on github (no affiliation). For completeness, below I copy-paste the code from that repo:
from keras import backend as K
from keras import losses
def loss(y_true, y_pred):
weights = K.cast(
K.abs(K.argmax(y_true, axis=1) - K.argmax(y_pred, axis=1))/(K.int_shape(y_pred)[1] - 1),
dtype='float32'
)
return (1.0 + weights) * losses.categorical_crossentropy(y_true, y_pred)

Diverging loss in Keras with custom loss

I have a fully connected feed-forward network implemented with Keras. Initially, I used binary cross-entropy as the loss and the metric, and Adam optimizer as follows
adam = keras.optimizers.Adam(lr=0.01, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
model.compile(optimizer=adam, loss='binary_crossentropy', metrics=['binary_crossentropy'])
This model trains well and gives good results. In order to get better results I want to use a different loss function and metric as below,
import keras.backend as K
def soft_bit_error_loss(yTrue, yPred):
loss = K.pow(1 - yPred, yTrue) * K.pow(yPred, 1-yTrue)
return K.mean(loss)
def ber(yTrue, yPred):
x_hat_train = K.cast(K.greater(yPred, 0.5), 'uint8')
train_errors = K.cast(K.not_equal(K.cast(yTrue, 'uint8'), x_hat_train), 'float32')
train_ber = K.mean(train_errors)
return train_ber
I use it to compile my model as below
model.compile(optimizer=adam, loss=soft_bit_error_loss, metrics=[ber])
However, when I do that, the loss and the metric diverge after some training, everytime as in the following pictures.
What can be the cause of this?
Your loss function is very unstable, look at it:
Where I replaced y_pred (variable) with x and y_true (constant) with c for simplicity.
As your predictions approach zero, at least one operation will tend to 1/0, which is infinite. Although by the limits theory you can know the result is ok, Keras doesn't know the "whole" function as one, it calculates derivatives based on the basic operations used.
So, one easy solution is the one pointed by #today:
loss = K.switch(yTrue == 1, 1 - yPred, yPred)
It's exactly the same function (difference only when c is not zero or 1).
Also, even easier, for c=0 or c=1, it's just a plain loss='mae'.

Categories