I am trying to perform a binary classification using tensorflow (V.1.1.0) with a single neuron at the output layer. The snippet below corresponds to the loss function and optimizer I am currently using (inspired from the answer here).
ratio=.034 #minority/population ratio
learning_rate=0.001
class_weight=tf.constant([[ratio,1.0-ratio]],name='unbalanced_ratio') #weight vector, (lab_feed is one_hot labels)
weight_per_label=tf.transpose(tf.matmul(lab_feed,tf.transpose(class_weight)),name='weights_per_label')
xent=tf.multiply(weight_per_label,tf.nn.sigmoid_cross_entropy_with_logits(labels=lab_feed,logits=output),name='loss')
loss=tf.reduce_mean(xent)
optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate,name='GradientDescent').minimize(loss)
My issue is however that for some reason all instances are classified as the same class after progression of epochs. Do I have to stop training in the middle or is there something wrong with the loss function?
You are misusing sigmoid cross-entropy as if it were softmax cross-entropy.
Sigmoid cross-entropy is adapted to binary classification — your problem is binary classification, so that's fine. But then, the output of your net should have only one channel per binary classification task — in your case, you have a single binary classification task, so your net should have one output channel only.
To balance a sigmoid cross-entropy you need to balance each individual part of the cross-entropy, i.e. the part coming from the positive and the part coming from the negative. This cannot be done on the output, as you are doing, because the output is already a sum of the positive and negative parts.
Hopefully there is a function in tensorflow to do just that, tf.nn.weighted_cross_entropy_with_logits. Its use is similar to tf.nn.sigmoid_cross_entropy with an additional parameter corresponding to the weight of the positive class.
What you are currently doing, is having two binary classifiers on two different channels, and sending only the negative samples to the first and the positives samples to the second. This cannot produce something useful.
Related
I’m trying to modify Yolo v1 to work with my task which each object has only 1 class. (e.g: an obj cannot be both cat and dog)
Due to the architecture (other outputs like localization prediction must be used regression) so sigmoid was applied to the last output of the model (f.sigmoid(nearly_last_output)). And for classification, yolo 1 also use MSE as loss. But as far as I know that MSE sometimes not going well compared to cross entropy for one-hot like what I want.
And specific: GT like this: 0 0 0 0 1 (let say we have only 5 classes in total, each only has 1 class so only one number 1 in them, of course this is class 5th in this example)
and output model at classification part: 0.1 0.1 0.9 0.2 0.1
I found some suggestion use nn.BCE / nn.BCEWithLogitsLoss but I think I should ask here for more correct since I’m not good at math and maybe I’m wrong somewhere so just ask to learn more and for sure what should I use correctly?
MSE loss is usually used for regression problem.
For binary classification, you can either use BCE or BCEWithLogitsLoss. BCEWithLogitsLoss combines sigmoid with BCE loss, thus if there is sigmoid applied on the last layer, you can directly use BCE.
The GT mentioned in your case refers to 'multi-class' classification problem, and the output shown doesn't really correspond to multi-class classification. So, in this case, you can apply a CrossEntropyLoss, which combines softmax and log loss and suitable for 'multi-class' classification problem.
When I use u-net for semantic segmentation of two categories, my output in the last layer of the model is set to 1 channel and 2 channel respectively. Then I use cross-entropy loss to measure: BCEloss and CrossEntropyLoss.
But the gap between the two is great. The performance of the former is normal, but the latter has a very low precision rate and a high recall rate.
I used pytorch.
Mathematically BCEloss (logist) is just a special case of CrossEntropy loss for the case of two classes.
Are you using a sigmoid or softmax in the output of the network? In PyTorch, CrossEntropy loss takes the raw output of the last layer (no need for softmax the output), that is done for numerical stability.
BCEloss only takes input in between 0 and 1. So a sigmoid is needed there. However, PyTorch has The BCEWithLogistLoss that applies the sigmoid for you, this version is more stable.
One more thing that it seems you are not doing correctly. (it would be nicer to have some minimum amount of code to better understand your problem). CrossEntropyLoss requires one channel per class. So if you have 2 classes, you have to give it an input with two channels. The logist (BCEloss) only takes one channel with a number ranging between 0 and 1. If I understood correctly, you are somehow giving it 2 channels. That will bring you problems in training.
My best guess is that the gap in performance between the two is due to misuse of the loss functions. The PyTorch documentation has improved a lot, I can recomend you spending a few minutes understanding the difference between each those three loss functions: https://pytorch.org/docs/stable/nn.html#loss-functions .
I am using VGG16 model and fine tuned them on my data. I am predicting ethnicity of images (faces) .i have 5 output classes like white, black,Asian, Sub-continent and others. Should i use softmax or sigmoid. And why??
Sigmoid:
Softmax:
When you use a softmax, basically you get a probability of each class, (join distribution and a multinomial likelihood) whose sum is bound to be one.
In the case of softmax, increasing the output value of one class makes the others go down (because sum=1 always). If you plan to find exactly one value (which is the case in your ethnicity classifier) you should use softmax function. The character of this function is “there can be only one”. So these are ideally used in multi-class problems like your problem.
Things are different for the sigmoid function. This function can provide us with the top n results based on the threshold. The feature of the sigmoid is to emphasize multiple values (yes, can be more than one, hence called "multi-label"), based on the threshold, and we use it for the multi-label classification problems.
In general cases, if you are dealing with multi-class clasification problems, you should use a Softmax because you are guaranted that the sum of probabilities of all clases will sum 1, by weighting them individually and computing the join distribution, whereas with a Sigmoid, you'd be predicting the probability of each class individually, but not necesarilly weighted. If not careful and aware of the difference you can run into some issues with your output.
I am trying doing OCC (one class classification) using an autoencoder based neural network.
To make a long story short I train my neural network with 200 matrices each containing 128 dataelements. Those are then compressed (see autoencoder).
Once the training is done I pass a new matrix to my neural net (test data) and based on the loss function I know whether the data I passed to it belongs to the target class or not.
I would like to know how I can compute a classification confidence in % based on the loss function I obtain when passing test data.
Thanks
In case it helps I am using Tensorflow
Well actually normally you try to minimize your cost function (or in the case of one training observation your loss function). Normally the probability of the class you want to predict is not done using the loss function, but using a sigmoid output layer for example. You need a function that goes from 0 to 1 and that behaves like a probability. Where did you get the idea of using the loss function to evaluate your proabibility? But I am not an expert in one class classification (or outliers detection)... I guess you actually want the probability of your observation of not belonging to your class right?
I have been playing with Lasagne for a while now for a binary classification problem using a Convolutional Neural Network. However, although I get okay(ish) results for training and validation loss, my validation and test accuracy is always constant (the network always predicts the same class).
I have come across this, someone who has had the same problem as me with Lasagne. Their solution was to setregression=True as they are using Nolearn on top of Lasagne.
Does anyone know how to set this same variable within Lasagne (as I do not want to use Nolearn)? Further to this, does anyone have an explanation as to why this needs to happen?
Looking at the code of the NeuralNet class from nolearn, it looks like the parameter regression is used in various places, but most of the times it affects how the output value and loss are computed.
In case of regression=False (the default), the network outputs the class with the maximum probability, and computes the loss with the categorical crossentropy.
On the other hand, in case of regression=True, the network outputs the probabilities of each class, and computes the loss with the squared error on the output vector.
I am not an expert in deep learning and CNN, but the reason this may have worked is that in case of regression=True and if there is a small error gradient, applying small changes to the network parameters may not change the predicted class and the associated loss, and may lead the algorithm to "think" that it has converged. But if instead you look at the class probabilities, small parameter changes will affect the probabilities and the resulting mean squared error, and the network will continue down this path which may eventually change the predictions.
This is just a guess, it is hard to tell without seeing the code and the dataset.