I have a neural network built with Keras that I'm attempting to train. The output layer has 4 nodes. For the problem I'm trying to solve, I only want to compute the gradient on a single one of the output nodes based upon the true value. Basically, y_true will look like this [0,0,2,0] where the zeros represent nodes that should be ignored. y_pred however will be of the form [1.2,3.2,4.5,6]. I'd like to make it such that only the third index is taken into account in mse. This would require that I zero out index 0, 1, and 3 in y_pred. I haven't found a proper way to do this.
Below is code I've tried, but which returns NaN from the loss function.
def custom_mse(y_true, y_pred):
return K.mean(K.square(tf.truediv(y_pred*y_true,y_true)-y_pred), axis=-1)
Is there a way to do this simple operation on these Tensor objects?
doing it like this:
[1.2,3.2,4.5,6]*[0,0,2,0] = [0,0,9,0]
[0,0,9,0]/2 = [0,0,4.5,0]
and then continue normally.
This is the code to do that:
def custom_mse(y_true, y_pred):
return K.mean(K.square(tf.divide(tf.multiply(y_pred, y_true),tf.reduce_max(y_true))-y_true), axis=-1)
Related
I am performing a NLP task where I analyze a document and classify it into one of six categories. However, I do this operation at three different time periods. So the final output is an array of three integers (sparse), where each integer is the category 0-5. So a label looks like this: [1, 4, 5].
I am using BERT and am trying to decide what type of head I should attach to it, as well as what type of loss function I should use. Would it make sense to use BERT's output of size 1024 and run it through a Dense layer with 18 neurons, then reshape into something of size (3,6)?
Finally, I assume I would use Sparse Categorical Cross-Entropy as my loss function?
The bert final hidden state is (512,1024). You can either take the first token which is the CLS token or take the average pooling. Either way your final output is shape (1024,) now simply put 3 linear layers of shape (1024,6) as in nn.Linear(1024,6) and pass it into the loss function below. (you can make it more complex if you want to)
Simply add up the loss and call backward. Remember you can call loss.backward() on any scalar tensor.(pytorch)
def loss(time1output,time2output,time3output,time1label,time2label,time3label):
loss1 = nn.CrossEntropyLoss()(time1output,time1label)
loss2 = nn.CrossEntropyLoss()(time2output,time2label)
loss3 = nn.CrossEntropyLoss()(time3output,time3label)
return loss1 + loss2 + loss3
In a typical setup you take a CLS output of BERT (a vector of length 768 in case of bert-base and 1024 in case of bert-large) and add a classification head (it may be a simple Dense layer with dropout). In this case the inputs are word tokens and the output of the classification head is a vector of logits for each class, and usually a regular Cross-Entropy loss function is used. Then you apply softmax to it and get probability-like scores for each class, or if you apply argmax you will get the winning class. So the result might be either vector of classification scores [1x6] or the dominant class index (an integer).
Image taken from d2l.ai
You can simply concatenate 3 such networks (for each time period) to get the desired result.
Obviously, I have described only one possible solution. But as it is usually provide good results I suggest you try it before moving over to more complex ones.
Finally, Sparse Categorical Cross-Entropy loss is used when output is sparse (say [4]) and regular Categorical Cross-Entropy loss is used when output is one-hot encoded (say [0 0 0 0 1 0]). Otherwise they are absolutely the same.
I'm new on StackOverflow and I also recently started to work with Tensorflow and Keras. Currently I'm developing an architecture using LSTM units. My question was partially discussed here:
What does the implementation of keras.losses.sparse_categorical_crossentropy look like?
However, in my model I have a predicted tensor, y_hat, of size (batch_size, seq_length, vocabulary_dimension) and the true labels, y, of size (batch_size, seq_length).
I would like to know how the value of the loss is computed when I call
loss = sparse_categorical_crossentropy(y,y_hat): how does the sparse_crossentropy function calculate the loss value starting from two tensors of different dimensions?
The cross entropy is a way to compare two probability distributions. That is, it says how different or similar the two are. It is a mathematical function defined on two arrays or continuous distributions as shown here.
The 'sparse' part in 'sparse_categorical_crossentropy' indicates that the y_true value must have a single value per row, e.g. [0, 2, ...] that indicates which outcome (category) was the right choice. The model then outputs the y_pred that must be like [[.99, .01, 0], [.01, .5, .49], ...]. Here, model predicts that the 0th category has a chance of .99 in the first row. This is very close to the true value, that is [1,0,0]. The sparse_categorical_crossentropy would then calculate a single number with two distributions using the above mentioned formula and return that number.
If you used a 'categorical_crossentropy' it would expect the y_true to be a one-hot encoded vector, like [[0,0,1], [0,1,0], ...].
If you would like to know the details in depth, you can take a look at the source.
Currently, I have this out put from my model:
egen = keras.models.Model(egen_input, [classes,x])
where x has [None, 32, 32, 3] and classes has [None, 2] as their dimension. How can I reference only part of the output in a custom loss function?
for example,
def customLoss():
def loss(y_true, y_pred):
return keras.losses.binary_crossentropy(y_true, y_pred[0])
return loss
currently the above loss function returns me error on mismatched dimension,yet if i just use y_pred, it does not return error...very confused here
Thanks!
If you want to use only classes, which is the first output, to calculate the loss, then you can set the loss_weights option (https://keras.io/models/model/) when compiling.
model.compile(...., loss_weights=[1.0, 0.0])
Note also that loss is computed for each output separately, then averaged (with equal weight at default) across outputs to obtain a single loss metric. So y_pred[0] does not mean classes, but the first element of classes and x.
EDITS.
if it's the first element of classes and x, what would be the shape of y_pred[0] ? bit confused here
Both! Keras computes the loss for classes and x separately, then take the (weighted) average. So, if the loss function is defined as return keras.losses.binary_crossentropy(y_true, y_pred[0]) as in the question, keras tries to calculate the loss with classes_true vs class_pred[0], and with x_true vs x_pred[0], which raises shape mismatch error.
I'm trying to implement my custom loss function in Keras using TensorFlow backend. The idea is for the neural network to input coefficients for Gaussians and compare the sum of four Gaussians to the model output. So we're fitting Gaussians to the data. I'd like to have y_pred in the form of [a_0, b_0, c_0, a_1, ..., c_3] and calculate the sum of a_i*e^((x-b_i)^2/2c_i), i=0,1,2,3 and then work out for example mean absolute error comparing this function to y_true. What I tried was
def gauss_loss(y_true, y_pred):
# zs is the the size y_true
# the size of y_pred is 12
xs = np.linspace(0, 1, zs)
gauss_sum = 0
for i in range(0, 12, 3):
gauss_sum += y_pred[:,i]*K.exp(-(xs-y_pred[:,i+1])**2/(2*y_pred[:,i+2]))
return 1./zs*sum(K.abs(y_true-gauss_sum))
I get the error "TypeError: Tensor objects are not iterable when eager execution is not enabled. To iterate over this tensor use tf.map_fn".
However, I don't think I can use tf.map_fn either because it only accepts one argument so I can't use the first entry of y_pred as coefficient a and the next as b in the same formula.
All examples I find just use tensor operations for the entire matrix. It seems to me that this might not even be possible in Keras. Is this possible and if so, how is it done?
I cannot find anywhere how exactly is backpropagation done in Keras? Let me explain:
Lets say i have network
input = Input(shape=(X,X,Y))
x = Conv2D(32,(3,3),padding="same")(input)
x = Conv2D(64,(3,3),padding="same")(x)
x = Conv2D(128,(3,3),padding="same")(x)
x = Conv2D(64,(3,3),padding="same")(x)
Output = Flatten(1024)(x)
Output = Flatten(6)(Output)
model = Model(input,Output)
model.compile(loss="mean_squared_error", optimizer=keras.optimizers.Adam(),metrics=['accuracy'])
model.fit(trainingData,trainingLabels)
The output of last layer is compared to trainingLabels, mean squared error is computed and Backpropagation happens based on the value of mean squaed error
However, what if i wanted to something more. And for example I want to try every permutation of output vector, and the one that results in minimal mean squared error should be treated as output, thus Backpropagation happens based on permutation with least error.
Is something like this possible in Keras? If so, how can i achieve it
The loss argument of model.compile method accepts a python function. You could compute the minimum over the set of permutations in a custom function:
def custom_loss(y_true, y_predicted):
''' code here '''
and then pass it to model.compile:
model.compile(loss=custom_loss,
optimizer=keras.optimizers.Adam(),
metrics=['accuracy'])
See here and here for reference.