I am trying to use binary cross entropy for binary classification problem and keep running into following error, I have tried type casting as well as reshaping the tensor to shape [-1, 1], but nothing seems to work out.
My Last 2 Layers are defined as,
dense_fin2 = tf.layers.dense(inputs = dense_fin, units = 128, name = "dense_fin2")
logits = tf.sigmoid(tf.layers.dense(inputs = dense_fin2, units = 1, name = "logits"))
Loss function,
loss = labels * -tf.log(logits) + (1 - labels) * -tf.log(1 - logits)
loss = tf.reduce_mean(loss)
Error thrown by tensorflow,
ValueError: Tensor conversion requested dtype int32 for Tensor with dtype float32: 'Tensor("Neg:0", shape=(?, 1), dtype=float32)'
Extra information,
I am using Estimator API coupled with Dataset API. I have integer labels i.e. 0 or 1. They are NOT one-hot encoded. I understand this is doable by one hot encoding my labels but I do not want to take that path.
This error likely comes from trying to multiply integer-type labels with float-type logits. You can explicity cast the labels to float via tf.cast(labels, dtype=tf.float32). Unfortunately your question does not reveal whether you tried this specific casting.
However, for reasons of numerical stability I would advise you to use tf.nn.sigmoid_cross_entropy_with_logits instead (or tf.losses.sigmoid_cross_entropy). This is also a good idea for correctness; cross-entropy uses log-probabilities, but you are already putting in logits (which are log unnormalized probabilities) so the extra tf.log is actually wrong. You could also add a tf.nn.sigmoid activation to the output layer to make it correct, however the built-in functions are still preferrable for stability.
Related
I'm creating a model using the Keras functional API.
The layer architecture is as follows:
n = tf.keras.layers.Dense(1)(input)
for i in tf.range(n):
output = tf.keras.layers.Dense(4)(input)
I then concat the outputs and return for a tensor with shape [1, None, 4] where [1] is the batch dimension, [None] is n, and [4] is the output from the second dense layer.
My loss function involves comparing the shape of the expected output, and comparing the outputs.
loss = tf.convert_to_tensor(abs(tf.shape(logits)[1] - tf.shape(expected)[1])) * 100.
When running this on a custom training loop, I'm getting the error
ValueError: No gradients provided for any variable: (['while/dense/kernel:0',
'while/dense/bias:0', 'while/while/dense_1/kernel:0', 'while/while/dense_1/bias:0'],).
Provided `grads_and_vars` is ((None, <tf.Variable 'while/dense/kernel:0' shape=(786432, 1)
Shape is not differentiable, you cannot do things like this with gradient based learning. Problems like this need to be tackled with more powerful tools, e.g. reinforcement learning where one considers n as an action, and get policy gradient for that.
A rule of thumb to remember is that you cannot really backprop through discrete objects. You need to produce floats, as gradients require smooth functions. In your case n should be an integer (what does a loop over a float mean?) so this should be your first warning sign. The other being shape itself, which is also an integer. A target can be discrete, but not the prediction. Note that even in classification we do not output class we output probability as probability is smooth.
You could build your model by assuming some maximum number of N and treat it more like a classification where you supervise N directly, and use some form of masking to keep all the results around.
I am trying to train a u-net network with binary targets. The usual Binary Cross Entropy loss does not perform well, since the lables are very imbalanced (many more 0 pixels than 1s). So I want to punish false negatives more. But tensorflow doesn't have a ready-made weighted binary cross entropy. Since I didn't want to write a loss from scratch, I'm trying to use tf.nn.weighted_cross_entropy_with_logits. to be able to easily feed the loss to model.compile function, I'm writing this wrapper:
def loss_wrapper(y,x):
x = tf.cast(x,'float32')
loss = tf.nn.weighted_cross_entropy_with_logits(y,x,pos_weight=10)
return loss
However, regardless of casting x to float, I'm still getting the error:
TypeError: Input 'y' of 'Mul' Op has type float32 that does not match type int32 of argument 'x'.
when the tf loss is called. Can someone explain what's happening?
If x represents your predictions. It probably already has the type float32. I think you need to cast y, which is presumably your labels. So:
loss = tf.nn.weighted_cross_entropy_with_logits(tf.cast(y, dtype=tf.float32),x,pos_weight=10)
I'm trying to recreate a transformer that was written in Pytorch and make it Tensorflow. Everything was going pretty well until each version of MultiHeadAttention started giving extremely different outputs. Both methods are an implementation of multi-headed attention as described in the paper "Attention is all you Need", so they should be able to achieve the same output.
I'm converting
self_attn = nn.MultiheadAttention(dModel, nheads, dropout=dropout)
to
self_attn = MultiHeadAttention(num_heads=nheads, key_dim=dModel, dropout=dropout)
For my tests, dropout is 0.
I'm calling them with:
self_attn(x,x,x)
where x is a tensor with shape=(10, 128, 50)
As expected from the documentation, the Pytorch version returns a tuple, (the target sequence length, embedding dimension), both with dimensions [10, 128, 50].
I'm having trouble getting the TensorFlow version to do the same thing. With Tensorflow I only get one tensor back, (size [10, 128, 50]) and it looks like neither the target sequence length or embedding dimension tensor from pytorch.
Based on the Tensorflow documentation I should be getting something comparable.
How can I get them to operate the same way? I'm guessing I'm doing something wrong with Tensorflow but I can't figure out what.
nn.MultiheadAttention outputs by default tuple with two tensors:
attn_output -- result of self-attention operation
attn_output_weights -- attention weights averaged(!) over heads
At the same time tf.keras.layers.MultiHeadAttention outputs by default only one tensor attention_output (which corresponds to attn_output of pytorch). Attention weights of all heads also will be returned if parameter return_attention_scores is set to True, like:
output, scores = self_attn(x, x, x, return_attention_scores=True)
Tensor scores also should be averaged to achieve full correspondence with pytorch:
scores = tf.math.reduce_mean(scores, 1)
While rewriting keep in mind that by default (as in snippet in question) nn.MultiheadAttention expects input in form (seq_length, batch_size, embed_dim), but tf.keras.layers.MultiHeadAttention expects it in form (batch_size, seq_length, embed_dim).
I wanted to implement class weights to my 3 class classification problem.
Tried by just directly adding the weights, which gives me an error when passing my model output and the labels to my loss
criterion = nn.CrossEntropyLoss(weight=torch.tensor([1,2,2]))
The error:
loss = criterion(out, labels)
expected scalar type Float but found Long
So I print dtypes and change them to float but it still gives me the same error
labels = labels.float()
print("Labels Training", labels, labels.dtype)
print("Out Training ", out, out.dtype)
loss = criterion(out, labels)
>>Labels Training tensor([2.]) torch.float32
>>Out Training tensor([[ 0.0540, -0.1439, -0.0070]], grad_fn=<AddmmBackward0>) torch.float32
>>expected scalar type Float but found Long
I also tried to change it to float64(), but it tells me that tensor Object has no attribute float64
Problem: I Havent tried this one out but I have seen that the more used approach would be the RandomWeightedSampler. My problem is that I use CV with K-Fold and use a SubSampler for that. Is it possible to use both? Havent foudn anything related to that.
For the first Problem, nn.CrossEntropyLoss requires the output to be of type float, label of type long, and weight of type float. Therefore, you should change the optional parameter of nn.CrossEntropyLoss "weight" to be float by:
criterion = nn.CrossEntropyLoss(weight=torch.tensor([1.0,2.0,2.0]))
loss = criterion(out, labels.long())
I'm trying to apply the concept of distillation, basically to train a new smaller network to do the same as the original one but with less computation.
I have the softmax outputs for every sample instead of the logits.
My question is, how is the categorical cross entropy loss function implemented?
Like it takes the maximum value of the original labels and multiply it with the corresponded predicted value in the same index, or it does the summation all over the logits (One Hot encoding) as the formula says:
As an answer to "Do you happen to know what the epsilon and tf.clip_by_value is doing?",
it is ensuring that output != 0, because tf.log(0) returns a division by zero error.
(I don't have points to comment but thought I'd contribute)
I see that you used the tensorflow tag, so I guess this is the backend you are using?
def categorical_crossentropy(output, target, from_logits=False):
"""Categorical crossentropy between an output tensor and a target tensor.
# Arguments
output: A tensor resulting from a softmax
(unless `from_logits` is True, in which
case `output` is expected to be the logits).
target: A tensor of the same shape as `output`.
from_logits: Boolean, whether `output` is the
result of a softmax, or is a tensor of logits.
# Returns
Output tensor.
This code comes from the keras source code. Looking directly at the code should answer all your questions :) If you need more info just ask !
EDIT :
Here is the code that interests you :
# Note: tf.nn.softmax_cross_entropy_with_logits
# expects logits, Keras expects probabilities.
if not from_logits:
# scale preds so that the class probas of each sample sum to 1
output /= tf.reduce_sum(output,
reduction_indices=len(output.get_shape()) - 1,
keep_dims=True)
# manual computation of crossentropy
epsilon = _to_tensor(_EPSILON, output.dtype.base_dtype)
output = tf.clip_by_value(output, epsilon, 1. - epsilon)
return - tf.reduce_sum(target * tf.log(output),
reduction_indices=len(output.get_shape()) - 1)
If you look at the return, they sum it... :)