Implementing a Gaussian-based loss function in Keras - python

I'm trying to implement my custom loss function in Keras using TensorFlow backend. The idea is for the neural network to input coefficients for Gaussians and compare the sum of four Gaussians to the model output. So we're fitting Gaussians to the data. I'd like to have y_pred in the form of [a_0, b_0, c_0, a_1, ..., c_3] and calculate the sum of a_i*e^((x-b_i)^2/2c_i), i=0,1,2,3 and then work out for example mean absolute error comparing this function to y_true. What I tried was
def gauss_loss(y_true, y_pred):
# zs is the the size y_true
# the size of y_pred is 12
xs = np.linspace(0, 1, zs)
gauss_sum = 0
for i in range(0, 12, 3):
gauss_sum += y_pred[:,i]*K.exp(-(xs-y_pred[:,i+1])**2/(2*y_pred[:,i+2]))
return 1./zs*sum(K.abs(y_true-gauss_sum))
I get the error "TypeError: Tensor objects are not iterable when eager execution is not enabled. To iterate over this tensor use tf.map_fn".
However, I don't think I can use tf.map_fn either because it only accepts one argument so I can't use the first entry of y_pred as coefficient a and the next as b in the same formula.
All examples I find just use tensor operations for the entire matrix. It seems to me that this might not even be possible in Keras. Is this possible and if so, how is it done?

Related

Loss function for comparing two vectors for categorization

I am performing a NLP task where I analyze a document and classify it into one of six categories. However, I do this operation at three different time periods. So the final output is an array of three integers (sparse), where each integer is the category 0-5. So a label looks like this: [1, 4, 5].
I am using BERT and am trying to decide what type of head I should attach to it, as well as what type of loss function I should use. Would it make sense to use BERT's output of size 1024 and run it through a Dense layer with 18 neurons, then reshape into something of size (3,6)?
Finally, I assume I would use Sparse Categorical Cross-Entropy as my loss function?
The bert final hidden state is (512,1024). You can either take the first token which is the CLS token or take the average pooling. Either way your final output is shape (1024,) now simply put 3 linear layers of shape (1024,6) as in nn.Linear(1024,6) and pass it into the loss function below. (you can make it more complex if you want to)
Simply add up the loss and call backward. Remember you can call loss.backward() on any scalar tensor.(pytorch)
def loss(time1output,time2output,time3output,time1label,time2label,time3label):
loss1 = nn.CrossEntropyLoss()(time1output,time1label)
loss2 = nn.CrossEntropyLoss()(time2output,time2label)
loss3 = nn.CrossEntropyLoss()(time3output,time3label)
return loss1 + loss2 + loss3
In a typical setup you take a CLS output of BERT (a vector of length 768 in case of bert-base and 1024 in case of bert-large) and add a classification head (it may be a simple Dense layer with dropout). In this case the inputs are word tokens and the output of the classification head is a vector of logits for each class, and usually a regular Cross-Entropy loss function is used. Then you apply softmax to it and get probability-like scores for each class, or if you apply argmax you will get the winning class. So the result might be either vector of classification scores [1x6] or the dominant class index (an integer).
Image taken from d2l.ai
You can simply concatenate 3 such networks (for each time period) to get the desired result.
Obviously, I have described only one possible solution. But as it is usually provide good results I suggest you try it before moving over to more complex ones.
Finally, Sparse Categorical Cross-Entropy loss is used when output is sparse (say [4]) and regular Categorical Cross-Entropy loss is used when output is one-hot encoded (say [0 0 0 0 1 0]). Otherwise they are absolutely the same.

Tensorflow with Keras: sparse_categorical_crossentropy

I'm new on StackOverflow and I also recently started to work with Tensorflow and Keras. Currently I'm developing an architecture using LSTM units. My question was partially discussed here:
What does the implementation of keras.losses.sparse_categorical_crossentropy look like?
However, in my model I have a predicted tensor, y_hat, of size (batch_size, seq_length, vocabulary_dimension) and the true labels, y, of size (batch_size, seq_length).
I would like to know how the value of the loss is computed when I call
loss = sparse_categorical_crossentropy(y,y_hat): how does the sparse_crossentropy function calculate the loss value starting from two tensors of different dimensions?
The cross entropy is a way to compare two probability distributions. That is, it says how different or similar the two are. It is a mathematical function defined on two arrays or continuous distributions as shown here.
The 'sparse' part in 'sparse_categorical_crossentropy' indicates that the y_true value must have a single value per row, e.g. [0, 2, ...] that indicates which outcome (category) was the right choice. The model then outputs the y_pred that must be like [[.99, .01, 0], [.01, .5, .49], ...]. Here, model predicts that the 0th category has a chance of .99 in the first row. This is very close to the true value, that is [1,0,0]. The sparse_categorical_crossentropy would then calculate a single number with two distributions using the above mentioned formula and return that number.
If you used a 'categorical_crossentropy' it would expect the y_true to be a one-hot encoded vector, like [[0,0,1], [0,1,0], ...].
If you would like to know the details in depth, you can take a look at the source.

how to reference one output from a multi-outputs with different dimension in Keras

Currently, I have this out put from my model:
egen = keras.models.Model(egen_input, [classes,x])
where x has [None, 32, 32, 3] and classes has [None, 2] as their dimension. How can I reference only part of the output in a custom loss function?
for example,
def customLoss():
def loss(y_true, y_pred):
return keras.losses.binary_crossentropy(y_true, y_pred[0])
return loss
currently the above loss function returns me error on mismatched dimension,yet if i just use y_pred, it does not return error...very confused here
Thanks!
If you want to use only classes, which is the first output, to calculate the loss, then you can set the loss_weights option (https://keras.io/models/model/) when compiling.
model.compile(...., loss_weights=[1.0, 0.0])
Note also that loss is computed for each output separately, then averaged (with equal weight at default) across outputs to obtain a single loss metric. So y_pred[0] does not mean classes, but the first element of classes and x.
EDITS.
if it's the first element of classes and x, what would be the shape of y_pred[0] ? bit confused here
Both! Keras computes the loss for classes and x separately, then take the (weighted) average. So, if the loss function is defined as return keras.losses.binary_crossentropy(y_true, y_pred[0]) as in the question, keras tries to calculate the loss with classes_true vs class_pred[0], and with x_true vs x_pred[0], which raises shape mismatch error.

Keras: handling batch size dimension for custom pearson correlation metric

I want to create a custom metric for pearson correlation as defined here
I'm not sure how exactly to apply it to batches of y_pred and y_true
What I did:
def pearson_correlation_f(y_true, y_pred):
y_true,_ = tf.split(y_true[:,1:],2,axis=1)
y_pred, _ = tf.split(y_pred[:,1:], 2, axis=1)
fsp = y_pred - K.mean(y_pred,axis=-1,keepdims=True)
fst = y_true - K.mean(y_true,axis=-1, keepdims=True)
corr = K.mean((K.sum((fsp)*(fst),axis=-1))) / K.mean((
K.sqrt(K.sum(K.square(y_pred -
K.mean(y_pred,axis=-1,keepdims=True)),axis=-1) *
K.sum(K.square(y_true - K.mean(y_true,axis=-1,keepdims=True)),axis=-1))))
return corr
Is it necessary for me to use keepdims and handle the batch dimension manually and the take the mean over it? Or does Keras somehow do this automatically?
When you use K.mean without an axis, Keras automatically calculates the mean for the entire batch.
And the backend already has standard deviation functions, so it might be cleaner (and perhaps faster) to use them.
If your true data is shaped like (BatchSize,1), I'd say keep_dims is unnecessary. Otherwise I'm not sure and it would be good to test the results.
(I don't understand why you use split, but it seems also unnecessary).
So, I'd try something like this:
fsp = y_pred - K.mean(y_pred) #being K.mean a scalar here, it will be automatically subtracted from all elements in y_pred
fst = y_true - K.mean(y_true)
devP = K.std(y_pred)
devT = K.std(y_true)
return K.mean(fsp*fst)/(devP*devT)
If it's relevant to have the loss for each feature instead of putting them all in the same group:
#original shapes: (batch, 10)
fsp = y_pred - K.mean(y_pred,axis=0) #you take the mean over the batch, keeping the features separate.
fst = y_true - K.mean(y_true,axis=0)
#mean shape: (1,10)
#fst shape keeps (batch,10)
devP = K.std(y_pred,axis=0)
devt = K.std(y_true,axis=0)
#dev shape: (1,10)
return K.sum(K.mean(fsp*fst,axis=0)/(devP*devT))
#mean shape: (1,10), making all tensors in the expression be (1,10).
#sum is only necessary because we need a single loss value
Summing the result of the ten features or taking a mean of them is the same, being one 10 times the other (That is not very relevant to keras models, affecting only the learning rate, but many optimizers quickly find their way around this).

Keras/Tensorflow custom loss function

I have a neural network built with Keras that I'm attempting to train. The output layer has 4 nodes. For the problem I'm trying to solve, I only want to compute the gradient on a single one of the output nodes based upon the true value. Basically, y_true will look like this [0,0,2,0] where the zeros represent nodes that should be ignored. y_pred however will be of the form [1.2,3.2,4.5,6]. I'd like to make it such that only the third index is taken into account in mse. This would require that I zero out index 0, 1, and 3 in y_pred. I haven't found a proper way to do this.
Below is code I've tried, but which returns NaN from the loss function.
def custom_mse(y_true, y_pred):
return K.mean(K.square(tf.truediv(y_pred*y_true,y_true)-y_pred), axis=-1)
Is there a way to do this simple operation on these Tensor objects?
doing it like this:
[1.2,3.2,4.5,6]*[0,0,2,0] = [0,0,9,0]
[0,0,9,0]/2 = [0,0,4.5,0]
and then continue normally.
This is the code to do that:
def custom_mse(y_true, y_pred):
return K.mean(K.square(tf.divide(tf.multiply(y_pred, y_true),tf.reduce_max(y_true))-y_true), axis=-1)

Categories