I have defined a loss function like this:
def my_loss(y_recon, y_real, brain_hidden, brain_real):
loss = torch.mean((y_recon - y_real)**2 + (brain_hidden - brain_real)**2
return loss
y_recon's shape (and y_real) is batch_size*300 and brain_hidden's shape (and brain_real) is batch_size*64
I need to minimize these two both elements. However, this way I get the error
The size of tensor a (300) must match the size of tensor b (64) at
non-singleton dimension 1
How can I update the loss function to avoid this error?
Related
I have a set of 256x256 images that are each labeled with nine, binary 256x256 masks. I am trying to calculate the pos_weight in order to weight the BCEWithLogitsLoss using Pytorch.
The shape of my masks tensor is tensor([1000, 9, 256, 256]) where 1000 is the number of training images, 9 is the number of mask channels (all encoded to 0/1), and 256 is the size of each image side.
To calculate pos_weight, I have summed the zeros in each mask, and divided that number by the sum of all of the ones in each mask (following the advice suggested here.):
(masks[:,channel,:,:]==0).sum()/masks[:,channel,:,:].sum()
Calculating the weight for every mask channel provides a tensor with the shape of tensor([9]), which seems intuitive to me, since I want a pos_weight value for each of the nine mask channels. However when I try to fit my model, I get the following error message:
RuntimeError: The size of tensor a (9) must match the size of
tensor b (256) at non-singleton dimension 3
This error message is surprising because it suggests that the weights need to be the size of one of the image sides, but not the number of mask channels. What shape should pos_weight be and how do I specify that it should be providing weights for the mask channels instead of the image pixels?
TLDR; This is a broadcasting issue which is surprisingly not handled by PyTorch's nn.BCEWithLogitsLoss namely F.binary_cross_entropy_with_logits. It might actually be worth putting out a Github issue linking to this SO thread to notify the developers of this undesirable behaviour.
In the documentation page of nn.BCEWithLogitsLoss, it is stated that the provided positive weights tensor pos_weight:
Must be a vector with length equal to the number of classes.
This is of course what you were expecting (rightly so) since positive weights refer to the weight given to the positive instances for every single class. Since your prediction and target tensors are multi-dimensional this seems to not be handled properly by PyTorch.
Anyhows, here is a minimal example showing how you can bypass this error and also showing the manual computation of the binary cross-entropy, as reference.
Here is the setup of the prediction and target tensors pred and label respectively:
>>> c=2;b=5;h=3;w=3
>>> pred = torch.rand(b,c,h,w)
>>> label = torch.randint(0,2, (b,c,h,w), dtype=float)
Now for the definition of the positive weight, notice the leading singletons dimensions:
>>> pos_weight = torch.rand(c,1,1)
In your case, with your existing 1D tensor of length c, you would simply have to unsqueeze two extra dimensions for the height and width dimensions. This means doing something like: pos_weight = pos_weight[:,None,None].
Calling the bce with logits function or its oop equivalent:
>>> F.binary_cross_entropy_with_logits(pred, label, pos_weight=pos_weight).mean()
Which is equivalent, in plain code to:
>>> z = torch.sigmoid(pred)
>>> bce = -(pos_weight*label*torch.log(z) + (1-label)*torch.log(1-z))
Note, that the built-in function would have the desired behaviour (i.e. no error message) if the class dimension was last in your prediction and target tensors.
>>> pos_weight = torch.rand(c)
>>> F.binary_cross_entropy_with_logits(
... pred.transpose(1,-1),
... label.transpose(1,-1),
... pos_weight=pos_weight)
In other words, we are applying the function with format NHWC which means the pos_weight of format C can be multiplied properly. So the result above effectively yields the same result as:
>>> F.binary_cross_entropy_with_logits(
... pred,
... label,
... pos_weight=pos_weight[:,None,None])
You can read more about the pos_weight in BCEWithLogitsLoss in another thread here
I want to write a custom loss function for a Keras network.
In this function, the result depends not only on y_actual and y_pred, but also on some other value that I extract from database using the value of y_actual.
I wrote the following function. In this function, I want to involve the label of y_actual that I retrieve from database in the loss calculation.
def custom_loss(y_actual,y_pred):
dataset_whole=pd.read_sql("select * from Records", con=db)
dataset_features_only=dataset_whole.drop(['label'],axis=1)
dataset_whole_np=dataset_whole.values
dataset_whole_tf=tf.constant(dataset_whole_np)
dataset_features_np=dataset_features_only.values
dataset_features_tf=tf.constant(dataset_features_np)
index=tf.where(tf.equal(dataset_features_tf, y_actual))
row=dataset_whole_tf[index,:]
label=row['label']
return ((y_actual-y_pred)-tf.Variable(label,tf.float64))**
I see this error message in the row=dataset_whole_tf[index,:] line:
ValueError: Shapes must be equal rank, but are 2 and 0 From merging shape 0 with other shapes. for 'loss/dense_7_loss/strided_slice/stack_1' (op: 'Pack') with input shapes: [?,2], [].
I'm trying to train a model that takes n values as input and output n values. The problem is that n can be from 1 to 700. So I build a network with 700 as input and 700 as output. The extra inputs and outputs are set to zero.
When training the model, I don't care about if the extra outputs are accurate or not. So I tried to define my own loss function as follows:
def mse_truncate(y_true, y_pred):
def fn(x):
return tf.cond(x < 0.01,lambda: 0.0,lambda: 1.0)
#Ignore the square error if y_true[i] is near zero
sgn = tf.map_fn(fn,y_true)
return K.mean(sgn * K.square(y_true-y_pred),axis=-1)
This function works on console.
But when I compile the model, I get an error:
model.compile(optimizer='sgd',loss=mse_truncate, metrics=['accuracy'])
ValueError: Shape must be rank 0 but is rank 1 for 'loss_5/dense_2_loss/map/while/cond/Switch' (op: 'Switch') with input shapes: [?], [?].
Can someone tell me what's wrong here?
Or are there better ways to handle the variable length input and output?
Note:
More on the problem, the input is a sequence(length <= 700) and the output is the distance between the first element and each element in the sequence.
You could use tf.where and tf.gather to only take those values that you care about into consideration, e.g.:
indices = tf.where(tf.greater(y_true, 0.01)) # or `tf.less`, `tf.equal` etc.
loss = K.mean(K.square(tf.gather(y_true, indices) - tf.gather(y_pred, indices))))
I'm working in Keras with TensorFlow under the hood. I have a deep neural model (predictive autoencoder). I'm doing something somewhat similar to this: https://arxiv.org/abs/1612.00796 -- I'm trying to understand influence of variables in a given layer on the output.
For this I need to find 2nd derivative (Hessian) of the loss (L) with respect to output of particular layer (s):
Diagonal entries would be sufficient. L is a scalar, s is 1 by n.
What I tried first:
dLds = tf.gradients(L, s) # works fine to get first order derivatives
d2Lds2 = tf.gradients(dLds, s) # throws an error
TypeError: Second-order gradient for while loops not supported.
I also tried:
d2Lds2 = tf.hessians(L, s)
ValueError: Computing hessians is currently only supported for one-dimensional tensors. Element number 0 of `xs` has 2 dimensions.
I cannot change shape of s cause it's a part of neural network (LSTM's state). The first dimension (batch_size) is already set to 1, I don't think I can get rid of it.
I cannot reshape s because it breaks flow of the gradients, e.g.:
tf.gradients(L, tf.reduce_sum(s, axis=0))
gives:
[None]
Any ideas on what can I do in this situation?
This is not supported at the moment. See this report.
I have coded a neural network that returns a list of 3 numbers for every input sample. These values are then subtracted from the actual values to get the difference.
For example,
actual = [1,2,3]
predicted = [0,0,1]
diff = [1,2,2]
So my tensor now has the shape [batch_size, 3]
What I want to do is to iterate over the tensor elements to construct my loss function.
For instance, if my batch_size is 2 and finally
diff = [[a,b,c],[d,e,f]]
I want the loss to be
Loss = mean(sqrt(a^2+b^2+c^2), sqrt(d^2+e^2+f^2))
I know that TensorFlow has a tf.nn.l2_loss() function that computes the L2 loss of the entire tensor. But what I want is the mean of l2 losses of elements of a tensor along some axis.
How do I go about doing this?
You can use tf.sqrt followed by tf.reduce_sum and tf.reduce_mean. Both tf.reduce_sum and tf.reduce_mean have an axis argument that indicates which dimensions to reduce.
For more reduction operations, see https://www.tensorflow.org/api_guides/python/math_ops#Reduction