BCELoss for binary pixel-wise segmentation pytorch - python

I'm implementing a UNet for binary segmentation while using Sigmoid and BCELoss. The problem is that after several iterations the network tries to predict very small values per pixel while for some regions it should predict values close to one (for ground truth mask region). Does it give any intuition about the wrong behavior?
Besides, there exist NLLLoss2d which is used for pixel-wise loss. Currently, I'm simply ignoring this and I'm using MSELoss() directly. Should I use NLLLoss2d with Sigmoid activation layer?
Thanks

Seems to me like that your Sigmoids are saturating the activation maps. The images are not properly normalised or some batch normalisation layers are missing. If you have an implementation that is working with other images check the image loader and make sure it does not saturate the pixel values. This usually happens with 16-bits channels. Can you share some of the input images?
PS Sorry for commenting in the answer. This is a new account and I am not allowed to comment yet.

You might want to use torch.nn.BCEWithLogitsLoss(), replacing the Sigmoid and the BCELoss function.
An excerpt from the docs tells you why its always better to use this loss function implementation.
This loss combines a Sigmoid layer and the BCELoss in one single class. This version is more numerically stable than using a plain Sigmoid followed by a BCELoss as, by combining the operations into one layer, we take advantage of the log-sum-exp trick for numerical stability.

Related

When using the cross-entropy function for binary classification, a big gap between the model output scalar and a two-dimensional vector

When I use u-net for semantic segmentation of two categories, my output in the last layer of the model is set to 1 channel and 2 channel respectively. Then I use cross-entropy loss to measure: BCEloss and CrossEntropyLoss.
But the gap between the two is great. The performance of the former is normal, but the latter has a very low precision rate and a high recall rate.
I used pytorch.
Mathematically BCEloss (logist) is just a special case of CrossEntropy loss for the case of two classes.
Are you using a sigmoid or softmax in the output of the network? In PyTorch, CrossEntropy loss takes the raw output of the last layer (no need for softmax the output), that is done for numerical stability.
BCEloss only takes input in between 0 and 1. So a sigmoid is needed there. However, PyTorch has The BCEWithLogistLoss that applies the sigmoid for you, this version is more stable.
One more thing that it seems you are not doing correctly. (it would be nicer to have some minimum amount of code to better understand your problem). CrossEntropyLoss requires one channel per class. So if you have 2 classes, you have to give it an input with two channels. The logist (BCEloss) only takes one channel with a number ranging between 0 and 1. If I understood correctly, you are somehow giving it 2 channels. That will bring you problems in training.
My best guess is that the gap in performance between the two is due to misuse of the loss functions. The PyTorch documentation has improved a lot, I can recomend you spending a few minutes understanding the difference between each those three loss functions: https://pytorch.org/docs/stable/nn.html#loss-functions .

Customized convolutional layer in TensorFlow

Let's assume i want to make the following layer in a neural network: Instead of having a square convolutional filter that moves over some image, I want the shape of the filter to be some other shape, say a rectangle, circle, triangle, etc (this is of course a silly example; the real case I have in mind is something different). How would I implement such a layer in TensorFlow?
I found that one can define custom layers in Keras by extending tf.keras.layers.Layer, but the documentation is quite limited without many examples. A python implementation of a convolutional layer by for example extending the tf.keras.layer.Layer would probably help as well, but it seems that the convolutional layers are implemented in C. Does this mean that I have to implement my custom layer in C to get any reasonable speed or would Python TensorFlow operations be enough?
Edit: Perhaps it is enough if I can just define a tensor of weights, but where I can customize entries in the tensor that are identically zero and some weights showing up in multiple places in this tensor, then I should be able to by hand build a convolutional layer and other layers. How would I do this, and also include these variables in training?
Edit2: Let me add some more clarifications. We can take the example of building a 5x5 convolutional layer with one output channel from scratch. If the input is say 10x10 (plus padding so output is also 10x10)), I would imagine doing this by creating a matrix of size 100x100. Then I would fill in the 25 weights in the correct locations in this matrix (so some entries are zero, and some entries are equal, ie all 25 weights will show up in many locations in this matrix). I then multiply the input with this matrix to get an output. So my question would be twofold: 1. How do I do this in TensorFlow? 2. Would this be very inefficient and is some other approach recommended (assuming that I want to later customize what this filter looks like and thus the standard conv2d is not good enough).
Edit3: It seems doable by using sparse tensors and assigning values via a previously defined tf.Variable. However I don't know if this approach will suffer from performance issues.
Just use regular conv. layers with square filters, and zero out some values after each weight update:
g = tf.get_default_graph()
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
conv1_filter = g.get_tensor_by_name('conv1:0')
sess.run(tf.assign(conv1_filter, tf.multiply(conv1_filter, my_mask)))
where my_mask is a binary tensor (of the same shape and type as your filters) that matches the desired pattern.
EDIT: if you're not familiar with tensorflow, you might get confused about using the code above. I recommend looking at this example, and specifically at the way the model is constructed (if you do it like this you can access first layer filters as 'conv1/weights'). Also, I recommend switching to PyTorch :)

how to make only some of the values in the conv filter trainable in tensorflow

So basically I want to train a cnn using tensorflow. To cover a large area in the image, I would ideally want a large filter. However, large filter means larger number of variables which could lead to more sever overfitting. So, I'm thinking about using a sampling filter which has a lot of constant zeros and the rest weights being trainable.
Is there a way to do this? I know tf.get_variable will just make all the weights inside the filter trainable. How do I fix some of the weights to zero during training?
There might be a way to do this via some sort of masking operation, however I think your best bet would be to use dilated convolutions. These insert "holes" between the filter values which sounds like what you are looking for. The standard convolution ops in Tensorflow support this; tf.nn.conv2d has a dilations argument and tf.layers.conv2d uses dilation_rate.
It is best to understand this with pictures so here is a blog post that has some.

What should the output layer of my CNN look like?

I am running a model to detect a few interesting features in an image. I have a set of images measuring 600x200 px. These images have features such as rock fragments that I would like to identify. Imagine a (4x12) grid overlayed on the image I can produce annotations manually using an annotator tool such as ((4,9), (3,10), (3,11), (3,12)) to identify the interesting cells in the image. I can build a CNN model with Keras with the input as a grayscale image. But how should I encode the output. One way that seems intuitive to me is to treat it as a sparse matrix of shape (12,4,1) and only the interesting cells have 1 while others have 0.
Is there a better way to encode the outputs?
What should be the activation function on the last layer be? I am using ReLU for the hidden layers.
What should the loss function be? Will mean_squared_error work?
Your problem is really similiar to both detection and segmentation problems (you can read about it e.g. here. The approach you proposed is reasonable because in both detection and segmentation tasks computing the feature map you proposed is an usual part of training pipeline. However - there are several problem you might come across:
memory issues: you need to either deal with sparse tensors or use generators in order to deal with memory problems,
loss and activation: loss and activation for segmentation are currently not supported by Keras API so you need to implement it on your own. Here and here you can find an examples on how to tackle this problem.
In case of detection only (not classification of this points) I would advice you to use sigmoid and binary_crossentropy. In case of classification softmax and categorical_crossentropy.
Of course - there are other ways on how to tackle this problem. One could solve it as a regression where you need to predict the pixels where there is something interesting. But dealing with varying input in Keras is rather cumbersome.

How to visualize features classified by tensorflow?

I'm running the default classify_image code of the imagenet model. Is there any way to visualize the features that it has extracted? If I use 'pool_3:0', that gives me the feature vector. Is there any way to overlay this on top of my image to see which features it has picked as important?
Ross Girshick described one way to visualize what a pooling layer has learned: https://www.cs.berkeley.edu/~rbg/papers/r-cnn-cvpr.pdf
Essentially instead of visualizing features, you find a few images that a neuron fires most on. You repeat that for a few or all neurons from your feature vector. The algorithm needs lots of images to choose from of course, e.g. the test set.
I wrote my implementation of this idea for cifar10 model in Tensorflow today, which I want to share (uses OpenCV): https://gist.github.com/kukuruza/bb640cebefcc550f357c
You could use it if you manage to provide the images tensor for reading images by batches, and the pool_3:0 tensor.

Categories