After fixing my code and prepare my data for training I've found myself in front of 2 question.
Background:
I have data made of date (one entry per minute) for the first column and congestion (value, between 0 and 200) for the 2nd. My goal is to feed it to my neural network and so be able to predict for the next week the congestion at each minute (my dataset is more than 10M of entry, I shouldn't have problem of lack of data for training).
Problem:
I now have two question. First about the loss, optimizer and linear. It seem there is a certain number of them and they all have a domain where they are better than the other, which one would you recommend for this project? (Currently on my test I use Adam as an optimizer and mean_square as loss and linear for activation).
My second question is more like an error that I have (may be linked to me using the wrong loss/optimizer). When using my code (10 000 data of training for now) I have an accuracy of 0, a low loss (0.00X) and a bad prediction (not even close to the reality). Do you have any idea of where it could come from?
What you are trying to do is called time series prediction (given data at time t-n, t-(n+1) ... t-1: predict the state at time t) and is generally a task for a recurrent neural network. Here is the great blog post by Andrej Karpathy about the topic that you should have a look at.
About your two questions:
This is hard to answer since the question of what optimizer to use highly depends on the input data. Generally speaking the network will converge no matter what optimizer you use. The time it takes to converge will differ however. Adaptive learning-rate methods, like Adagrad, Adadelta, and Adam tend to achieve convergence slightly faster. Here is a good write-up of the different optimizers.
Basic neural networks (MLPs) don't do well with time series prediction. That would be an explanation for the low accuracy. However I don't know why the loss would be 0.
Related
I had implemented a CNN with 3 Convolutional layers with Maxpooling and dropout after each layer
I had noticed that when I trained the model for the first time it gave me 88% as testing accuracy but after retraining it for the second time successively, with the same training dataset it gave me 92% as testing accuracy.
I could not understand this behavior, is it possible that the model had overfitting in the second training process?
Thank you in advance for any help!
It is quite possible if you have not provided the seed number set.seed( ) in the R language or tf.random.set_seed(any_no.) in python
Well I am no expert when it comes to machine learning but I do know the math behind it. What you are doing when you train a neural network you basicly find the local minima to the loss function. What this means is that the end result will heavily depend on the initial guess of all of the internal varaibles.
Usually the variables are randomized as a initial estimation and you could therefore reach quite different results from running the training process multiple times.
That being said, from when I studied the subject I was told that you usually reach similar regardless of the initial guess of the parameters. However it is hard to say if 0.88 and 0.92 would be considered similar or not.
Hope this gives a somewhat possible answer to your question.
As mentioned in another answer, you could remove the randomization, both in the parameter initialization of the parameters and the randomization of the data used for each epoch of training by introducing a seed. This would insure that when you run it twice, everything will get "randomized" in the exact same order. In tensorflow this is done using for example tf.random.set_seed(1), the number 1 can be changed to any number to get a new seed.
I am using LSTM for time-series prediction using Keras. I am using 3 LSTM layers with dropout=0.3, hence my training loss is higher than validation loss. To monitor convergence, I using plotting training loss and validation loss together. Results looks like the following.
After researching about the topic, I have seen multiple answers for example ([1][2] but I have found several contradictory arguments on various different places on the internet, which makes me a little confused. I am listing some of them below :
1) Article presented by Jason Brownlee suggests that validation and train data should meet for the convergence and if they don't, I might be under-fitting the data.
https://machinelearningmastery.com/diagnose-overfitting-underfitting-lstm-models/
https://machinelearningmastery.com/learning-curves-for-diagnosing-machine-learning-model-performance/
2) However, following answer on here suggest that my model is just converged :
How do we analyse a loss vs epochs graph?
Hence, I am just bit confused about the whole concept in general. Any help will be appreciated.
Convergence implies you have something to converge to. For a learning system to converge, you would need to know the right model beforehand. Then you would train your model until it was the same as the right model. At that point you could say the model converged! ... but the whole point of machine learning is that we don't know the right model to begin with.
So when do you stop training? In practice, you stop when the model works well enough to do what you want it to do. This might be when validation error drops below a certain threshold. It might just be when you can't afford any more computing power. It's really up to you.
I modelled a LSTM based text generator using a data set I have. The purpose of the model is to predict the end of sentences. My training is showing a validation accuracy of around 81%. When reading through a couple of articles, I found that unlike a classification problem I should be worried more about loss rather than accuracy. Is this the case, and if so what would be an ideal loss value? Right now my loss is around 1.5+.
There is no minimum limit for accuracy in any of the machine learning or Deep Learning problem.It's as many say garbage IN, garbage OUT
Quality of data and with a decent model will give you good accuracy.
Generally, these accuracy benchmark is set for the standard dataset available on an open internet like SQUAD, RACE, SWAG, GLUE and many more.
Usually, the state of the art models will check their performance on these datasets and set a accuarcy benchmark specific to these dataset.
Coming to your problem, you can tell the model is performing goog based on accuracy, and the evaluation metric you are using, generally in NLP to calculate loss is bit tricky. Considering your case where you are trying to predict the end of a sentence where there is no fixed dimension the reason being that the same information can be expressed in multiple ways with varying number of words.
By looking at the validation and test accuracy of your model it looks decent, but before pushing the accuracy you should worry about the overfitting problem also, the model should not be biased on your data.
You can try with different metrics to evaluate the model and you can compare the results on your own.
I hope this answers your question, Happy Learning!
I have several implementation of the same neural network, but each one with different starting parameter.
This is one of my plot comparing the training loss of the base experiment with the training loss of another experiment.
I have also other exaples:
May anyone point me to some instruction on how understand these output from the keras fit()? Note that I don't have any validation set.
Thanks
This is weird, your loss have weirs spikes and even increases in value....
I can imagine a few reasons:
The functions you created are not continuous or have weird behavior, like spikes and other things that might trick the idea decreasing the loss. This includes big contrasts between flat and steep regions.
You're using a weird custom optimizer
Your learning rate is too big
I am using TensorFlow for training model which has 1 output for the 4 inputs. The problem is of regression.
I found that when I use RandomForest to train the model, it quickly converges and also runs well on the test data. But when I use a simple Neural network for the same problem, the loss(Random square error) does not converge. It gets stuck on a particular value.
I tried increasing/decreasing number of hidden layers, increasing/decreasing learning rate. I also tried multiple optimizers and tried to train the model on both normalized and non-normalized data.
I am new to this field but the literature that I have read so far vehemently asserts that the neural network should marginally and categorically work better than the random forest.
What could be the reason behind non-convergence of the model in this case?
If your model is not converging it means that the optimizer is stuck in a local minima in your loss function.
I don't know what optimizer you are using but try increasing the momentum or even the learning rate slightly.
Another strategy employed often is the learning rate decay, which reduces your learning rate by a factor every several epochs. This can also help you not get stuck in a local minima early in the training phase, while achieving maximum accuracy towards the end of training.
Otherwise you could try selecting an adaptive optimizer (adam, adagrad, adadelta, etc) that take care of the hyperparameter selection for you.
This is a very good post comparing different optimization techniques.
Deep Neural Networks need a significant number of data to perform adequately. Be sure you have lots of training data or your model will overfit.
A useful rule for beginning training models, is not to begin with the more complex methods, for example, a Linear model, which you will be able to understand and debug more easily.
In case you continue with the current methods, some ideas:
Check the initial weight values (init them with a normal distribution)
As a previous poster said, diminish the learning rate
Do some additional checking on the data, check for NAN and outliers, the current models could be more sensitive to noise. Remember, garbage in, garbage out.