External loss function in Tensorflow - python

I have a simple 2 layer dense NN which I want to use a regression model to compute 4 number of given ~ 700 features of an image. Unfortunately, I do not have ground truth elements, so I use custom loss function. Here's the source of the function:
def loss_function(logits, img, g, compare_img):
final_img = img_pipeline(vga_8b=img, g=g%external color gamma function%)
with tf.name_scope('Loss'):
loss = score(gt_image=compare_img, curr_img=final_img)
return loss
Where logits are the current evaluated 4 numbers, g is just a interpolated function used as color gamma for the image, img is external grayscale image used to generate the final result image used for the score function. compare_img is not a ground truth image, but some statistical values (kept in python dict) used in the score function to evaluate the current produced image.
Unfortunately, I can't feed g and compare_img as they are python function and python dictionary which cannot be converted to tensors.
Is there a way to hack it somehow and achieve the desired result?
Thanks in advance!

You can use external functions with tensorflow with tf.map but I'm afraid to to say, that these are not able to calculate gradients through it. but you loss functions needs to be derivable in every case. So you have to write the function in tensorflow.
For your dict values you can create a lookuptable with
table = tf.contrib.lookup.HashTable(
tf.contrib.lookup.KeyValueTensorInitializer(keys, values), -1)

Related

Loss function for comparing two vectors for categorization

I am performing a NLP task where I analyze a document and classify it into one of six categories. However, I do this operation at three different time periods. So the final output is an array of three integers (sparse), where each integer is the category 0-5. So a label looks like this: [1, 4, 5].
I am using BERT and am trying to decide what type of head I should attach to it, as well as what type of loss function I should use. Would it make sense to use BERT's output of size 1024 and run it through a Dense layer with 18 neurons, then reshape into something of size (3,6)?
Finally, I assume I would use Sparse Categorical Cross-Entropy as my loss function?
The bert final hidden state is (512,1024). You can either take the first token which is the CLS token or take the average pooling. Either way your final output is shape (1024,) now simply put 3 linear layers of shape (1024,6) as in nn.Linear(1024,6) and pass it into the loss function below. (you can make it more complex if you want to)
Simply add up the loss and call backward. Remember you can call loss.backward() on any scalar tensor.(pytorch)
def loss(time1output,time2output,time3output,time1label,time2label,time3label):
loss1 = nn.CrossEntropyLoss()(time1output,time1label)
loss2 = nn.CrossEntropyLoss()(time2output,time2label)
loss3 = nn.CrossEntropyLoss()(time3output,time3label)
return loss1 + loss2 + loss3
In a typical setup you take a CLS output of BERT (a vector of length 768 in case of bert-base and 1024 in case of bert-large) and add a classification head (it may be a simple Dense layer with dropout). In this case the inputs are word tokens and the output of the classification head is a vector of logits for each class, and usually a regular Cross-Entropy loss function is used. Then you apply softmax to it and get probability-like scores for each class, or if you apply argmax you will get the winning class. So the result might be either vector of classification scores [1x6] or the dominant class index (an integer).
Image taken from d2l.ai
You can simply concatenate 3 such networks (for each time period) to get the desired result.
Obviously, I have described only one possible solution. But as it is usually provide good results I suggest you try it before moving over to more complex ones.
Finally, Sparse Categorical Cross-Entropy loss is used when output is sparse (say [4]) and regular Categorical Cross-Entropy loss is used when output is one-hot encoded (say [0 0 0 0 1 0]). Otherwise they are absolutely the same.

Gradient of neural network with respect to inputs

I am working on a NN with Pytorch which simply maps points from the plane into real numbers, for example
model = nn.Sequential(nn.Linear(2,2),nn.ReLU(),nn.Linear(2,1))
What I want to do, since this network defines a map h:R^2->R, is to compute the gradient of this mapping h in the training loop. So for example
for it in range(epochs):
pred = model(X_train)
grad = torch.autograd.grad(pred,X_train)
....
The training set has been defined as a tensor requiring the gradient. My problem is that even if the output, for each fixed point, is a scalar, since I am propagating a set of N=100 points, the output is actually a Nx1 tensor. This brings to the error: autograd can compute the gradient just of scalar functions.
In fact, trying with the little change
pred = torch.sum(model(X_train))
everything works perfectly. However I am interested in all the single gradients so, is there a way to compute all these gradients together?
Actually computing the sum as presented above gives exactly the same result I expect of course, but I wanted to know if this is the only possiblity.
There are other possibilities but using .sum is the simplest way. Using .sum() on the final loss vector and computing dpred/dinput will give you the desired output. Here is why:
Since, pred = sum(loss) = sum (f(xi))
where i is the index of input x.
dpred/dinput will be a matrix [dpred/dx0, dpred/dx1, dpred/dx...]
Consider, dpred/dx0, it will be equal to df(x0)/dx0, since other df(xi)/dx0 is 0.
PS: Please excuse the crappy mathematical expressions... SO does not support latex/math expressions.

What is the meaning of the result of model.predict() function for semantic segmentation?

I use Segmentation Models library for multi-class (in my case 4 class) semantic segmentation. The model (UNet with 'resnet34' backbone) is trained with 3000 RGB (224x224x3) images. The accuracy is around 92.80%.
1) Why model.predict() function requires (1,224,224,3) shaped array as input ? I didn't find the answer even in the Keras documentation. Actually, below code is working, I have no problem with it but I want to understand the reason.
predictions = model.predict( test_image.reshape(-1,224,224,3) );
2) predictions is a (1,224,224,3) shaped numpy array. Its data type is float32 and contains some floating numbers. What is the meaning of the numbers inside this array? How can I visualize them? I mean, I assumed that the result array will contain one of 4 class label (from 0 to 3) for every pixel, and then I will apply the color map for each class. In other words, the result should have been a prediction map, but I didn't get it. To understand better what I mean about prediction map, please visit the Jeremy Jordan's blog about semantic segmentation.
result = predictions[0]
plt.imshow(result) # import matplotlib.pyplot as plt
3) What I finally want to do is like Github: mrgloom - Semantic Segmentation Categorical Crossentropy Example did in visualy_inspect_result function.
1) Image input shape in your deep neural network architecture is (224,224,3), so width=height=224 and 3 color channels. And you need an additionnal dimension in case you want to give more than one image at a time to your model. So (1,224,224,3) or (something, 224,224,3).
2) According to the doc of Segementation models repo, you can specify the number of classes you want as output model = Unet('resnet34', classes=4, activation='softmax'). Thus if you reshape your labelled image to have a shape (1,224,224,4). The last dimension is a mask channel indicating with a 0 or 1 if pixel i,j belongs to class k. Then you can predict and access to each output mask
masked = model.predict(np.array([im])[0]
mask_class0 = masked[:,:,0]
mask_class1 = masked[:,:,1]
3) Then using matplotlib you will be able to plot semantic segmentation or using scikit-image : color.label2rgb function

Tensorflow: Weighted sparse_softmax_cross_entropy for inbalanced classes across a single image

I'm working on a binary semantic segmentation task where the distribution of one class is very smalls across any input image, hence there are only a few pixels which are labeled. When using sparse_softmax_cross_entropy
the over all error is easily decreased when ignoring this class. Now, I'm looking for a way to weight the classes by a coefficient which penalizes missclassifications for the specific class higher compared to the other class.
The doc of the loss function states:
weights acts as a coefficient for the loss. If a scalar is provided, then the loss is simply scaled by the given value. If weights is a tensor of shape [batch_size], then the loss weights apply to each corresponding sample.
If I understand this correctly, it says that specific sample in a batch get weighted differently compared to others. But this is actually not what I'm looking for. Does anyone know how to implement a weighted version of this loss function where the weights scale the importance of a specific class rather than samples?
To answer my own question:
The authors of the U-Net paper used a pre-computed weight-map to handle imbalanced classes.
The Institute for Anstronomy of ETH Zurich provided a Tensorflow-based U-Net package which contains a weighted version of the Softmax function (not sparse but they flatten their labels and logits first):
class_weights = tf.constant(np.array(class_weights, dtype=np.float32))
weight_map = tf.multiply(flat_labels, class_weights)
weight_map = tf.reduce_sum(weight_map, axis=1)
loss_map = tf.nn.softmax_cross_entropy_with_logits_v2(logits=flat_logits, labels=flat_labels)
weighted_loss = tf.multiply(loss_map, weight_map)
loss = tf.reduce_mean(weighted_loss)

Any examples about how to use tf.nn.l2_loss in tensorflow?

I am a beginner of tensorflow, currently I am developing a script of a learning task, in which I need to map the input image to another image, Here is my loss function:
loss = tf.reduce_mean(pred - y_)
where pred is my prediction of the image out of all the layers while y_ is the ground truth. The size of both of them are [batch_size * width * height * channel] ([64 x 128 x 128 x 3]). Here I simply do a substraction between these two tensors and find mean of them. As you can see this is l1 loss, but what should I do if I want to change the loss function into l2 loss? I know I should use function tf.nn.l2_loss, but the tutorial in the home page seems to be to advanced to me and don't have any examples.
Also, is there any methods in tensorflow that can return the data inside a "tensor"?
In this case squaring the difference before doing reduce_mean should give you a squared loss, so loss = tf.reduce_mean((pred-y_)*(pred-y_)) should do.
To inspect a tensor you can call tensor.eval(), but this has to happen in a context where there is enough information to compute the value of the Tensor (so all placeholders should be fed, etc).

Categories