I am defining a weighted mean squared error in Keras as follows:
def weighted_mse(yTrue,yPred):
data_weights = [w0,w1,w2,w3]
data_weights_np = np.asarray(data_weights, np.float32)
weights = tf.convert_to_tensor(data_weights_np, np.float32)
return K.mean(weights*K.square(yTrue-yPred))
I have a list of weights for each prediction. The predictions are of shape for example: (25,4). That is generated via final dense layer with dimension 4. I wish to weights these prediction in the mean squared error, so I generate a tensor and multiply it with the sum of squares error. Is this the correct way to do so?
Because, when I print the shape of the tensor, using tf.shape for YTrue and YPred it shows:
Tensor("loss_19/dense_20_loss/Shape:0", shape=(3,), dtype=int32)
and for weights:
Tensor("loss_19/dense_20_loss/Shape_2:0", shape=(1,), dtype=int32)
The Keras API already provides a mechanism to provide weights, for example the model.fit function. From the documentation:
class_weight: Optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function (during training only). This can be useful to tell the model to "pay more attention" to samples from an under-represented class.
sample_weight: Optional Numpy array of weights for the training samples, used for weighting the loss function (during training only). You can either pass a flat (1D) Numpy array with the same length as the input samples (1:1 mapping between weights and samples), or in the case of temporal data, you can pass a 2D array with shape (samples, sequence_length), to apply a different weight to every timestep of every sample. In this case you should make sure to specify sample_weight_mode="temporal" in compile().
If you have a weight for each sample, you can pass the NumPy array as sample_weight to achieve the same effect without writing your own loss function.
Related
I'm creating a model using the Keras functional API.
The layer architecture is as follows:
n = tf.keras.layers.Dense(1)(input)
for i in tf.range(n):
output = tf.keras.layers.Dense(4)(input)
I then concat the outputs and return for a tensor with shape [1, None, 4] where [1] is the batch dimension, [None] is n, and [4] is the output from the second dense layer.
My loss function involves comparing the shape of the expected output, and comparing the outputs.
loss = tf.convert_to_tensor(abs(tf.shape(logits)[1] - tf.shape(expected)[1])) * 100.
When running this on a custom training loop, I'm getting the error
ValueError: No gradients provided for any variable: (['while/dense/kernel:0',
'while/dense/bias:0', 'while/while/dense_1/kernel:0', 'while/while/dense_1/bias:0'],).
Provided `grads_and_vars` is ((None, <tf.Variable 'while/dense/kernel:0' shape=(786432, 1)
Shape is not differentiable, you cannot do things like this with gradient based learning. Problems like this need to be tackled with more powerful tools, e.g. reinforcement learning where one considers n as an action, and get policy gradient for that.
A rule of thumb to remember is that you cannot really backprop through discrete objects. You need to produce floats, as gradients require smooth functions. In your case n should be an integer (what does a loop over a float mean?) so this should be your first warning sign. The other being shape itself, which is also an integer. A target can be discrete, but not the prediction. Note that even in classification we do not output class we output probability as probability is smooth.
You could build your model by assuming some maximum number of N and treat it more like a classification where you supervise N directly, and use some form of masking to keep all the results around.
I'd like to compute the gradient of the loss wrt all the network params. The problem arises when I try to reshape each weight matrix in order to be 1 dimensional (it is useful for computations that I do later with the gradients).
At this point Tensorflow outputs a list of None (which means that there is no path from the loss to those tensors while there should be as they are the model parameters reshaped).
Here is the code:
all_tensors = list()
for dir in ["fw", "bw"]:
for mtype in ["kernel"]:
t = tf.get_default_graph().get_tensor_by_name("encoder/bidirectional_rnn/%s/lstm_cell/%s:0" % (dir, mtype))
all_tensors.append(t)
# classifier tensors:
for mtype in ["kernel", "bias"]:
t = tf.get_default_graph().get_tensor_by_name("encoder/dense/%s:0" % (mtype))
all_tensors.append(t)
all_tensors = [tf.reshape(x, [-1]) for x in all_tensors]
tf.gradients(self.loss, all_tensors)
all_tensor at the end of the for loops is a list of 4 components with matrices of different shapes. This code outputs [None, None, None, None].
If I remove the reshape line all_tensors = [tf.reshape(x, [-1]) for x in all_tensors]
the code works fine and returns 4 tensor containing the gradients wrt each param.
Why does it happen? I'm pretty sure that reshape doesn't break any dependency in the graph, otherwise it couldn't be used in any network at all.
Well, the fact is that there is no path from your tensors to the loss. If you think of the computation graph in TensorFlow, self.loss is defined through a series of operations that at some point use the tensors your are interested in. However, when you do:
all_tensors = [tf.reshape(x, [-1]) for x in all_tensors]
You are creating new nodes in the graph and new tensors that are not being used by anyone. Yes, there is a relationship between those tensors and the loss value, but from the point of view of TensorFlow that reshaping is an independent computation.
If you want to do something like that, you would have to do the reshaping first and then compute the loss using the reshaped tensors. Or, alternatively, you can just compute the gradients with respect to the original tensors and then reshape the result.
I am doing a semantic segmentation task using tensorflow. I have 5 classes, I calculate the loss like this:
loss = tf.reduce_mean((tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=tf.squeeze(annotation, squeeze_dims=[3]), name="entropy")))
logits has shape (batch_size, picture_height, picuture_width, 5)
annotation has shape (batch_size, picture_height, picuture_width, 1)
Now I only want to calculate the loss of the first 4 classes, ignore the 5th class. How can I achieve this?
For example, if I only want to calulate the Cohen's kappa of the first 4 classes, I can set the labels parameter in sklearn.metrics.cohen_kappa_score:
kappa = cohen_kappa_score(y_true, y_pred, labels=[0,1,2,3])
You can use not sparse version of cross entropy loss that accepts one-hot labels tf.losses.softmax_cross_entropy, and manually create one-hot vector using tf.one_hot.
It accepts depth argument that allows to use only first labels, or you can just slice result one-hot encoded tensor before passing to the loss.
I am trying to log AUC during training time of my model.
According to the documentation, tf.metric.auc needs a label and predictions, both of same shape.
But in my case of binary classification, label is a one-dimensional tensor, containing just the classes. And prediction is two-dimensional containing probability for each class of each datapoint.
How to calculate AUC in this case?
Let's have a look at the parameters in the function tf.metrics.auc:
labels: A Tensor whose shape matches predictions. Will be cast to bool.
predictions: A floating point Tensor of arbitrary shape and whose values are in the range [0, 1].
This operation already assumes a binary classification. That is, each element in labels states whether the class is "positive" or "negative" for a single sample. It is not a 1-hot vector, which requires a vector with as many elements as the number of exclusive classes.
Likewise, predictions represents the predicted binary class with some level of certainty (some people may call it a probability), and each element should also refer to one sample. It is not a softmax vector.
If the probabilities came from a neural network with a fully connected layer of 2 neurons and a softmax activation at the head of the network, consider replacing that with a single neuron and a sigmoid activation. The output can now be fed to tf.metrics.auc directly.
Otherwise, you can just slice the predictions tensor to only consider the positive class, which will represent the binary class just the same:
auc_value, auc_op = tf.metrics.auc(labels, predictions[:, 1])
I have two massive numpy arrays of weights and biases for a CNN. I can set weights for each layer (using set_weights) but I don't see a way to set the bias for each layer. How do I do this?
You do this by using layer.set_weights(weights). From the documentation:
weights: a list of Numpy arrays. The number
of arrays and their shape must match
number of the dimensions of the weights
of the layer (i.e. it should match the
output of `get_weights`).
You don't just put the weights for the filter in there but for each parameter the layer has. The order in which you have to put in the weights depends on layer.weights. You may look at the code or print the names of the weights of the layer by doing something like
print([p.name for p in layer.weights])