I am trying doing OCC (one class classification) using an autoencoder based neural network.
To make a long story short I train my neural network with 200 matrices each containing 128 dataelements. Those are then compressed (see autoencoder).
Once the training is done I pass a new matrix to my neural net (test data) and based on the loss function I know whether the data I passed to it belongs to the target class or not.
I would like to know how I can compute a classification confidence in % based on the loss function I obtain when passing test data.
Thanks
In case it helps I am using Tensorflow
Well actually normally you try to minimize your cost function (or in the case of one training observation your loss function). Normally the probability of the class you want to predict is not done using the loss function, but using a sigmoid output layer for example. You need a function that goes from 0 to 1 and that behaves like a probability. Where did you get the idea of using the loss function to evaluate your proabibility? But I am not an expert in one class classification (or outliers detection)... I guess you actually want the probability of your observation of not belonging to your class right?
Related
I've been attempting to implement my own GAN with some limited successes. I tweaked a bit of how they trained it in the tutorial and was wondering if this more drastic change is viable. The first change I made was to make the discriminator classify n+1 classes where n could be, for example 10 in MNIST, and the n+1th class is the fake class. The discriminator was an imported architecture I made from scratch for a really good classifier. Then my GAN will have the "opposite" of the traditional NLLLoss.
This is the tricky non-traditional calculation part. So because I have a softmax in the last layer of my discriminator, my output from the discriminator will always be from 0 to 1. So I can create a custom loss function for the generator to be a horizontal flip of the NLLLoss which tries to make sure the discriminator does not classify the fakes as the n+1th class. The idea is that I don't care what class the fakes are classified as long as they're not the n+1th class. This behavior of misclassification is what I want to maximize for the generator.
Here is the function I plotted on desmos to give some visualization:
https://www.desmos.com/calculator/6gdqs28ihk
my actual code for the generator loss function is the code below while my discriminator loss function is the traditional NLLLoss
loss_G = torch.mean(-torch.log(1 - outputG.float()[:,classes]))
Please let me know if this is completely wrong or there is an easier way.
I'm a bit of a beginner in the art of machine learning. Here is a rather conceptual question I've been wondering:
Suppose I have a function X->Y, say y=x^2, then, generating enough data of X->Y, I can train a neural network to perform regression on the function, and get x^2 with any input x. This is basically also what the Universal Approximation Theorem suggests.
Now, my question is, what if I want the inverse relation, Y->X? In this case, Y is a multi-valued function of X, for instance for X>0, x=+-sqrt(y). I can swap X and Y as input/output data to train the network alright, but for any given y, there should be a random 1/2 - 1/2 chance that x=sqrt(y) and x=-sqrt(y). But of course, if one trains it with min-squared-error, the network wouldn't know this is a multi-value function, and would just follow SGD on the loss function and get x=0, the average value, for any given y.
Therefore, I wonder if there is any way a neural network can model a multi-valued function? For instance, my guess would be
(1) the neural network can output a collection of, say, the top 2 possible values for X and train it with cross-entropy. The problem is, if X is a vector or even a matrix (like a bit-map image) instead of a number, we don't know how many solutions Y=X has (which could very well be an infinite number, i.e. a continuous range), so a "list" of possible values and probabilities won't work - ideally the neural network should output values randomly and continuously distributed across possible X solutions.
(2) perhaps does this fall into the realm of probabilistic neural networks (PNN)? Does PNN model functions that support a given probabilistic distribution (continuous or discrete) of vectors as its output? If so, is it possible to implement PNN with popular frameworks like Tensorflow+Keras?
(Also, note that this is different from a "multivariate" function, which is the case where X,Y could be multi-component vectors, which is still something a traditional network can easily train on. The actual problem in question here is where the output could be a probabilistic distribution of vectors, which is something that a simple feed-forward network doesn't capture, since it doesn't have the inherent randomness.)
Thank you for your kind help!
Image of forward function Y=X^2 (can be easily modeled by network with regression)
Image of inverse function X=+-sqrt(Y) (the network cannot capture the two-value function and outputs the average value X=0 for any Y)
Try to read the following paper:
https://onlinelibrary.wiley.com/doi/abs/10.1002/ecjc.1028
Mifflin's algorithm (or its more general version SLQP-GS) mentioned in this paper is available here and corresponding paper with description is here.
Say we train a multilayer NN in tensorflow for a regression task (i.e. multi input and multi output case). Then we have new instances and we apply the trained model and of course we get the corresponding outputs. Is there a way to backpropagate the outputs and reconstruct the inputs in tensorflow in an easy/efficient manner? What I am thinking is to then use the difference of the original and the reconstructed inputs of the new instances as a QC measure i.e. if the reconstructed inputs are not close enough to the originals then we have a problem etc. I hope I am making myself clear.
No, unfortunately you cannot take a trained model and try to get the corresponding input. The reason for this is that you have infinite possible solutions for each output.
Furthermore, backpropagation is not passing an output backwards through the network. Its the idea of determining what parameters in the model are contributing to what extent to loss function. This will not give the inputs to these hidden layers, but the extent at which the weights affected your decision.
I am trying to perform a binary classification using tensorflow (V.1.1.0) with a single neuron at the output layer. The snippet below corresponds to the loss function and optimizer I am currently using (inspired from the answer here).
ratio=.034 #minority/population ratio
learning_rate=0.001
class_weight=tf.constant([[ratio,1.0-ratio]],name='unbalanced_ratio') #weight vector, (lab_feed is one_hot labels)
weight_per_label=tf.transpose(tf.matmul(lab_feed,tf.transpose(class_weight)),name='weights_per_label')
xent=tf.multiply(weight_per_label,tf.nn.sigmoid_cross_entropy_with_logits(labels=lab_feed,logits=output),name='loss')
loss=tf.reduce_mean(xent)
optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate,name='GradientDescent').minimize(loss)
My issue is however that for some reason all instances are classified as the same class after progression of epochs. Do I have to stop training in the middle or is there something wrong with the loss function?
You are misusing sigmoid cross-entropy as if it were softmax cross-entropy.
Sigmoid cross-entropy is adapted to binary classification — your problem is binary classification, so that's fine. But then, the output of your net should have only one channel per binary classification task — in your case, you have a single binary classification task, so your net should have one output channel only.
To balance a sigmoid cross-entropy you need to balance each individual part of the cross-entropy, i.e. the part coming from the positive and the part coming from the negative. This cannot be done on the output, as you are doing, because the output is already a sum of the positive and negative parts.
Hopefully there is a function in tensorflow to do just that, tf.nn.weighted_cross_entropy_with_logits. Its use is similar to tf.nn.sigmoid_cross_entropy with an additional parameter corresponding to the weight of the positive class.
What you are currently doing, is having two binary classifiers on two different channels, and sending only the negative samples to the first and the positives samples to the second. This cannot produce something useful.
I have been playing with Lasagne for a while now for a binary classification problem using a Convolutional Neural Network. However, although I get okay(ish) results for training and validation loss, my validation and test accuracy is always constant (the network always predicts the same class).
I have come across this, someone who has had the same problem as me with Lasagne. Their solution was to setregression=True as they are using Nolearn on top of Lasagne.
Does anyone know how to set this same variable within Lasagne (as I do not want to use Nolearn)? Further to this, does anyone have an explanation as to why this needs to happen?
Looking at the code of the NeuralNet class from nolearn, it looks like the parameter regression is used in various places, but most of the times it affects how the output value and loss are computed.
In case of regression=False (the default), the network outputs the class with the maximum probability, and computes the loss with the categorical crossentropy.
On the other hand, in case of regression=True, the network outputs the probabilities of each class, and computes the loss with the squared error on the output vector.
I am not an expert in deep learning and CNN, but the reason this may have worked is that in case of regression=True and if there is a small error gradient, applying small changes to the network parameters may not change the predicted class and the associated loss, and may lead the algorithm to "think" that it has converged. But if instead you look at the class probabilities, small parameter changes will affect the probabilities and the resulting mean squared error, and the network will continue down this path which may eventually change the predictions.
This is just a guess, it is hard to tell without seeing the code and the dataset.