Convolutional Neural Network accuracy with Lasagne (regression vs classification) - python

I have been playing with Lasagne for a while now for a binary classification problem using a Convolutional Neural Network. However, although I get okay(ish) results for training and validation loss, my validation and test accuracy is always constant (the network always predicts the same class).
I have come across this, someone who has had the same problem as me with Lasagne. Their solution was to setregression=True as they are using Nolearn on top of Lasagne.
Does anyone know how to set this same variable within Lasagne (as I do not want to use Nolearn)? Further to this, does anyone have an explanation as to why this needs to happen?

Looking at the code of the NeuralNet class from nolearn, it looks like the parameter regression is used in various places, but most of the times it affects how the output value and loss are computed.
In case of regression=False (the default), the network outputs the class with the maximum probability, and computes the loss with the categorical crossentropy.
On the other hand, in case of regression=True, the network outputs the probabilities of each class, and computes the loss with the squared error on the output vector.
I am not an expert in deep learning and CNN, but the reason this may have worked is that in case of regression=True and if there is a small error gradient, applying small changes to the network parameters may not change the predicted class and the associated loss, and may lead the algorithm to "think" that it has converged. But if instead you look at the class probabilities, small parameter changes will affect the probabilities and the resulting mean squared error, and the network will continue down this path which may eventually change the predictions.
This is just a guess, it is hard to tell without seeing the code and the dataset.

Related

is it incorrect to change a model's parameters after training it?

i was trying to use average ensembling on a group of models i trained earlier (i'm creating a new model in the ensemble for each pre-trained model i'm using and then loading the trained weights onto it, it's inefficient this way i know but i'm just learning about it so it doesn't really matter). and I mistakenly changed some of the network's parameters when loading the models in the ensemble code like using Relu instead of leakyRelu which i used in training the models and a different value for an l2 regularizer in the dense layer in one of the models. this however gave me a better testing accuracy for the ensemble. can you please explain to me if/how this is incorrect, and if it's normal can i use this method to further enhance the accuracy of the ensemble.
I believe it is NOT correct to chnage model's parameters after training it. parameters here I mean the trainable-parameters like the weights in Dense node but not hyper-parameters like learning rate.
What is training?
Training essentially is a loop that keeps changing, or update, the parameters. It updates the parameter in such a way that it believes it can reduce the loss. It is also like moving your point in a hyper-spaces that the loss function gives a small loss on that point.
Smaller loss means higher accruacy in general.
Changing Weights
So now, changing your parameters values, by mistake or by purpose, is like moving that point to somewhere BUT you have no logical reason behind that such move will give you a smaller loss. You are just randomly wandering around that hyper-space and in your case you are just lucky that you land to some point that so happened give you a smaller loss or a better testing accuracy. It is just purely luck.
Changing activation function
Also, altering the activation function from leakyRelu to relu is similar you randomly alter the shape of your hype-space. Even though you are at the some point the landscape changes, you are still have no logical reason to believe by such change of landscape you can have a smaller loss staying at the same point
When you change the model manually, you need to retrain.
Though you changed the network's parameters when loading the models. It is not incorrect to alter the hyper-parameters of your ensemble's underlying models. In some cases, the models that are used in an ensemble method require unique tunings which can, as you mentioned, give "you a better testing accuracy for the ensemble model."
To answer your second question, you can use this method to further enhance the accuracy of the ensemble, you can also use Bayesian optimization, GridSearch, and RandomSearch if you prefer more automated means of tuning your hyperparameters.

How to give specific outputs higher priority in accuracy / reducing loss, when training Neural Network

So I am dealing with a simple neural network with 10 inputs and one output. I can have as many hidden layers as suggested, however I am using 2. I am also using "mean_squared_error" loss function and RMSProp optimizer.
Anyhow, the question I have is, lets suppose my output values are like this:
[0,0,3,0,0,0,5,0,0,2,0...] etc. Note, that value 0 repeats more often. So What I would love to do, is to try to force Neural Network to learn better in case "non zero values on the output side". To give more of an "importance" to those values.
Because if I use 'mean_squared_error', the training will try to optimize according to entire dataset, this will lead mostly to optimization of cases, where 0 is an output value.
EDIT:
The problem I am dealing with, could be simple modeling of physical system. Let us say, we have a black-box system with known inputs. This black-box has a single outputs (let us say temperature). Based on our inputs and corresponding outputs, we could model the system using Neural Network as a "black-box" and then use the trained NN to predict temperature.
EDIT:
So I am now using different training/validation set. I was suspecting that there is something wrong with the previous one.
Now I got something like the image above (please see the immediate spike)
What could cause that?
Keep in mind, I am not experienced in NNs, so literally any feedback are welcomed :)
there are two important concepts in ML.
"underfitting" and "overfitting", which in your case I think it's underfitting.
to overcome this problem there are some ways:
make your model more complex by adding more layers and units
if you are using regularization terms, decrease their values
use more features (if there is any)
hope this help you.
If your outputs are integers [0,0,3,0,0,0,5,0,0,2,0...], i.e., classes, you will probably do a classification. So, your loss should be categorical_crossentopy. In this case, there are two ways of doing what you want:
1- You can use SMOTE, Synthetic Minority Oversampling technique so that the non-zero classes get the same weight as the zero-class. For binary classes:
from imblearn.over_sampling import SMOTE
from imblearn.combine import SMOTEENN
sm = SMOTEENN()
x, y = sm.fit_sample(X, Y)
2- You can also adjust Keras class weights:
class_weight = {0: 1.,1: 30.}
model.fit(X, Y, nb_epoch=1000, batch_size=16, class_weight=class_weight)

How interpret keras training loss without compare with validation loss?

I have several implementation of the same neural network, but each one with different starting parameter.
This is one of my plot comparing the training loss of the base experiment with the training loss of another experiment.
I have also other exaples:
May anyone point me to some instruction on how understand these output from the keras fit()? Note that I don't have any validation set.
Thanks
This is weird, your loss have weirs spikes and even increases in value....
I can imagine a few reasons:
The functions you created are not continuous or have weird behavior, like spikes and other things that might trick the idea decreasing the loss. This includes big contrasts between flat and steep regions.
You're using a weird custom optimizer
Your learning rate is too big

Optimizing for accuracy instead of loss in Keras model

If I correctly understood the significance of the loss function to the model, it directs the model to be trained based on minimizing the loss value. So for example, if I want my model to be trained in order to have the least mean absolute error, i should use the MAE as the loss function. Why is it, for example, sometimes you see someone wanting to achieve the best accuracy possible, but building the model to minimize another completely different function? For example:
model.compile(loss='mean_squared_error', optimizer='sgd', metrics='acc')
How come the model above is trained to give us the best acc, since during it's training it will try to minimize another function (MSE). I know that, when already trained, the metric of the model will give us the best acc found during the training.
My doubt is: shouldn't the focus of the model during it's training to maximize acc (or minimize 1/acc) instead of minimizing MSE? If done in that way, wouldn't the model give us even higher accuracy, since it knows it has to maximize it during it's training?
To start with, the code snippet you have used as example:
model.compile(loss='mean_squared_error', optimizer='sgd', metrics='acc')
is actually invalid (although Keras will not produce any error or warning) for a very simple and elementary reason: MSE is a valid loss for regression problems, for which problems accuracy is meaningless (it is meaningful only for classification problems, where MSE is not a valid loss function). For details (including a code example), see own answer in What function defines accuracy in Keras when the loss is mean squared error (MSE)?; for a similar situation in scikit-learn, see own answer in this thread.
Continuing to your general question: in regression settings, usually we don't need a separate performance metric, and we normally use just the loss function itself for this purpose, i.e. the correct code for the example you have used would simply be
model.compile(loss='mean_squared_error', optimizer='sgd')
without any metrics specified. We could of course use metrics='mse', but this is redundant and not really needed. Sometimes people use something like
model.compile(loss='mean_squared_error', optimizer='sgd', metrics=['mse','mae'])
i.e. optimise the model according to the MSE loss, but show also its performance in the mean absolute error (MAE) in addition to MSE.
Now, your question:
shouldn't the focus of the model during its training to maximize acc (or minimize 1/acc) instead of minimizing MSE?
is indeed valid, at least in principle (save for the reference to MSE), but only for classification problems, where, roughly speaking, the situation is as follows: we cannot use the vast arsenal of convex optimization methods in order to directly maximize the accuracy, because accuracy is not a differentiable function; so, we need a proxy differentiable function to use as loss. The most common example of such a loss function suitable for classification problems is the cross entropy.
Rather unsurprisingly, this question of yours pops up from time to time, albeit in slight variations in context; see for example own answers in
Cost function training target versus accuracy desired goal
Targeting a specific metric to optimize in tensorflow
For the interplay between loss and accuracy in the special case of binary classification, you may find my answers in the following threads useful:
Loss & accuracy - Are these reasonable learning curves?
How does Keras evaluate the accuracy?

TensorFlow RandomForest vs Deep learning

I am using TensorFlow for training model which has 1 output for the 4 inputs. The problem is of regression.
I found that when I use RandomForest to train the model, it quickly converges and also runs well on the test data. But when I use a simple Neural network for the same problem, the loss(Random square error) does not converge. It gets stuck on a particular value.
I tried increasing/decreasing number of hidden layers, increasing/decreasing learning rate. I also tried multiple optimizers and tried to train the model on both normalized and non-normalized data.
I am new to this field but the literature that I have read so far vehemently asserts that the neural network should marginally and categorically work better than the random forest.
What could be the reason behind non-convergence of the model in this case?
If your model is not converging it means that the optimizer is stuck in a local minima in your loss function.
I don't know what optimizer you are using but try increasing the momentum or even the learning rate slightly.
Another strategy employed often is the learning rate decay, which reduces your learning rate by a factor every several epochs. This can also help you not get stuck in a local minima early in the training phase, while achieving maximum accuracy towards the end of training.
Otherwise you could try selecting an adaptive optimizer (adam, adagrad, adadelta, etc) that take care of the hyperparameter selection for you.
This is a very good post comparing different optimization techniques.
Deep Neural Networks need a significant number of data to perform adequately. Be sure you have lots of training data or your model will overfit.
A useful rule for beginning training models, is not to begin with the more complex methods, for example, a Linear model, which you will be able to understand and debug more easily.
In case you continue with the current methods, some ideas:
Check the initial weight values (init them with a normal distribution)
As a previous poster said, diminish the learning rate
Do some additional checking on the data, check for NAN and outliers, the current models could be more sensitive to noise. Remember, garbage in, garbage out.

Categories