This question already has an answer here:
Is deep learning bad at fitting simple non linear functions outside training scope (extrapolating)?
(1 answer)
Closed 4 years ago.
I am new to machine learning so I will apologize in advance if this question is somewhat recurrent, as I haven't been able to find a satisfying answer to my problem.
As a pedagogical exercise I have been trying to train an ANN to predict a sine wave. My problem is that although my neural network trains accurately the shape of the sine, it somewhat fails to do so in the validation set and to larger inputs. So I start by feeding my input and output as
x = np.arange(800).reshape(-1,1) / 50
y = np.sin(x)/2
The rest of the code goes as
model = Sequential()
model.add(Dense(20, input_shape=(1,),activation = 'tanh',use_bias = True))
model.add(Dense(20,activation = 'tanh',use_bias = True))
model.add(Dense(1,activation = 'tanh',use_bias = True))
model.compile(loss='mean_squared_error', optimizer=Adam(lr=0.005), metrics=['mean_squared_error'])
history = model.fit(x,y,validation_split=0.2, epochs=2000, batch_size=400, verbose=0)
I then devise a test, which is defined as
x1= np.arange(1600).reshape(-1,1) / 50
y1 = np.sin(x1)/2
prediction = model.predict(x1, verbose=1)
So the problem is that the ANN clearly starts to fail in the validation set and to predict a continuation of the sine wave.
Weird behaviour for validation set:
Failure to predict anything other than the training set:
So, what am I doing wrong? Is the ANN incapable of continuing the sine wave? I have tried to fine tune most of the available parameters without success. Most of the FAQs regarding similar issues are due to overfitting, but I haven't been able to solve this.
Congratulations on stumbling on one of the fundamental issues of deep learning on your first try :)
What you did is correct, and, indeed, the ANN (in its current form) is incapable of continuing the sine wave.
However, you can see signs of overfitting in your MSE graph, starting around epoch 800, when the validation error starts to increase.
As pointed above, your NN is not capable of "grasping" cyclic nature of your data.
You can think of your DNN made of dense layers only as of smarter version of linear regression — the reason to use DNN is to have high-level non-linear abstract features that can be "learned" by network itself, instead of engineering features by hand. On contrary, this features are mostly hard to describe and understand.
So, in general DNNs are good for predicting unknown points "in the middle", more far your x from training set, less accurate prediction will be. Again, in general.
To predict things that are cyclic in nature, you should either use more sophisticated architectures, or pre-process your data, i.e. by understanding "seasonality" or "base frequency".
Related
This question already has answers here:
Is deep learning bad at fitting simple non linear functions outside training scope (extrapolating)?
(1 answer)
Predicting sine with ANN using Keras [duplicate]
(2 answers)
Unable to approximate the sine function using a neural network
(4 answers)
Approximating the sine function with a neural network
(3 answers)
Approximating sine function with Neural Network and ReLU
(2 answers)
Closed 1 year ago.
I have being experimenting with different kinds of ANNs to do regression on basic and increasingly more complex functions. It seems, to me though that I cannot get my network to learn cyclic functions like a sine wave. I read on the web and on this forums that generally ANN are not good at this job but I cant seem to fathom why. Isn't learning any function within its domain the same?
For clarification I am trying to fit a sin wave from x=0 to x=100 using the following setup
def create_model():
model = tf.keras.models.Sequential([
keras.layers.Dense(units=1, activation=None,input_dim=1,kernel_initializer='random_normal'),
keras.layers.Dense(units=64,activation='linear',use_bias=True),
keras.layers.Dense(units=32,activation="relu",use_bias=True),
keras.layers.Dense(units=64,activation="relu"),
keras.layers.Dense(units=64,activation='linear',use_bias=True),
keras.layers.Dense(units=32,activation='relu'),
keras.layers.Dense(units=1, activation='sigmoid'),
])
model.compile(optimizer='adam',
loss='mean_squared_logarithmic_error',
metrics=['mean_squared_error'])
return model
# Create a basic model instance
model = create_model()
# Display the model's architecture
model.summary()`
I have regularized my data to fit into the [x,y]=[0,1]^2 space and fed it into the network. I have given the network 1000 points and left it to train for many epochs (~100,000) and these are the results I got:
Overfitting
Predictions
I can understand that this is standard over-fitting behavior but I can't understand why it behaves as such. In Goodfellow's Deep Learing (which I am in the process of reading) he explains that optimal behavior of a machine Learning Algorithm is between the overfitting and underfitting region. It seems then that the model I have created is not converging to the solution in the future and is expected to perform worse!
Does this mean it can't interpolate to the sine function? Also why is this function so much more demanding computationally (most simple functions i tried converged in <1000 epochs) compared to other? Does it mean it requires more layer or maybe more units per layer? I understand the problem to be a classic regression problem for which I though sequential models where good.
Last but not least, I know that ANNs are not the way to go for periodic functions, but I am trying to understand why they struggle in this as a regression method.
I am using LSTM for time-series prediction using Keras. I am using 3 LSTM layers with dropout=0.3, hence my training loss is higher than validation loss. To monitor convergence, I using plotting training loss and validation loss together. Results looks like the following.
After researching about the topic, I have seen multiple answers for example ([1][2] but I have found several contradictory arguments on various different places on the internet, which makes me a little confused. I am listing some of them below :
1) Article presented by Jason Brownlee suggests that validation and train data should meet for the convergence and if they don't, I might be under-fitting the data.
https://machinelearningmastery.com/diagnose-overfitting-underfitting-lstm-models/
https://machinelearningmastery.com/learning-curves-for-diagnosing-machine-learning-model-performance/
2) However, following answer on here suggest that my model is just converged :
How do we analyse a loss vs epochs graph?
Hence, I am just bit confused about the whole concept in general. Any help will be appreciated.
Convergence implies you have something to converge to. For a learning system to converge, you would need to know the right model beforehand. Then you would train your model until it was the same as the right model. At that point you could say the model converged! ... but the whole point of machine learning is that we don't know the right model to begin with.
So when do you stop training? In practice, you stop when the model works well enough to do what you want it to do. This might be when validation error drops below a certain threshold. It might just be when you can't afford any more computing power. It's really up to you.
So I am dealing with a simple neural network with 10 inputs and one output. I can have as many hidden layers as suggested, however I am using 2. I am also using "mean_squared_error" loss function and RMSProp optimizer.
Anyhow, the question I have is, lets suppose my output values are like this:
[0,0,3,0,0,0,5,0,0,2,0...] etc. Note, that value 0 repeats more often. So What I would love to do, is to try to force Neural Network to learn better in case "non zero values on the output side". To give more of an "importance" to those values.
Because if I use 'mean_squared_error', the training will try to optimize according to entire dataset, this will lead mostly to optimization of cases, where 0 is an output value.
EDIT:
The problem I am dealing with, could be simple modeling of physical system. Let us say, we have a black-box system with known inputs. This black-box has a single outputs (let us say temperature). Based on our inputs and corresponding outputs, we could model the system using Neural Network as a "black-box" and then use the trained NN to predict temperature.
EDIT:
So I am now using different training/validation set. I was suspecting that there is something wrong with the previous one.
Now I got something like the image above (please see the immediate spike)
What could cause that?
Keep in mind, I am not experienced in NNs, so literally any feedback are welcomed :)
there are two important concepts in ML.
"underfitting" and "overfitting", which in your case I think it's underfitting.
to overcome this problem there are some ways:
make your model more complex by adding more layers and units
if you are using regularization terms, decrease their values
use more features (if there is any)
hope this help you.
If your outputs are integers [0,0,3,0,0,0,5,0,0,2,0...], i.e., classes, you will probably do a classification. So, your loss should be categorical_crossentopy. In this case, there are two ways of doing what you want:
1- You can use SMOTE, Synthetic Minority Oversampling technique so that the non-zero classes get the same weight as the zero-class. For binary classes:
from imblearn.over_sampling import SMOTE
from imblearn.combine import SMOTEENN
sm = SMOTEENN()
x, y = sm.fit_sample(X, Y)
2- You can also adjust Keras class weights:
class_weight = {0: 1.,1: 30.}
model.fit(X, Y, nb_epoch=1000, batch_size=16, class_weight=class_weight)
I am using TensorFlow for training model which has 1 output for the 4 inputs. The problem is of regression.
I found that when I use RandomForest to train the model, it quickly converges and also runs well on the test data. But when I use a simple Neural network for the same problem, the loss(Random square error) does not converge. It gets stuck on a particular value.
I tried increasing/decreasing number of hidden layers, increasing/decreasing learning rate. I also tried multiple optimizers and tried to train the model on both normalized and non-normalized data.
I am new to this field but the literature that I have read so far vehemently asserts that the neural network should marginally and categorically work better than the random forest.
What could be the reason behind non-convergence of the model in this case?
If your model is not converging it means that the optimizer is stuck in a local minima in your loss function.
I don't know what optimizer you are using but try increasing the momentum or even the learning rate slightly.
Another strategy employed often is the learning rate decay, which reduces your learning rate by a factor every several epochs. This can also help you not get stuck in a local minima early in the training phase, while achieving maximum accuracy towards the end of training.
Otherwise you could try selecting an adaptive optimizer (adam, adagrad, adadelta, etc) that take care of the hyperparameter selection for you.
This is a very good post comparing different optimization techniques.
Deep Neural Networks need a significant number of data to perform adequately. Be sure you have lots of training data or your model will overfit.
A useful rule for beginning training models, is not to begin with the more complex methods, for example, a Linear model, which you will be able to understand and debug more easily.
In case you continue with the current methods, some ideas:
Check the initial weight values (init them with a normal distribution)
As a previous poster said, diminish the learning rate
Do some additional checking on the data, check for NAN and outliers, the current models could be more sensitive to noise. Remember, garbage in, garbage out.
I'm getting started with machine learning tools and I'd like to learn more about what the heck I'm doing. For instance, the script:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, BatchNormalization
from keras.initializers import RandomUniform
import numpy
numpy.random.seed(13)
RandomUniform(seed=13)
model = Sequential()
model.add(Dense(6, input_dim=6))
model.add(BatchNormalization())
model.add(Activation('tanh'))
model.add(Dropout(0.01))
model.add(Dense(11))
model.add(Activation('tanh'))
model.add(Dropout(0.01))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(optimizer='sgd', loss='mean_absolute_error', metrics=['accuracy'])
data = numpy.loadtxt('train', delimiter=' ')
X = data[:, 0:6]
Y = data[:, 6]
model.fit(X, Y, batch_size=1, epochs=1000)
data = numpy.loadtxt('test', delimiter=' ')
X = data[:, 0:6]
Y = data[:, 6]
score = model.evaluate(X, Y, verbose=1)
print ('\n\nThe error is:\n', score, "\n")
print('\n\nPrediction:\n')
Y = model.predict(X, batch_size=1, verbose=1)
print('\nResult:\n', Y, '\n')
It's a Frankenstein I made from some examples I found on the internet and I have many unanswered questions about it:
The file train has 60 rows. Is 1000 epochs too little? Is it too much? Can I get an Underfit/Overfit?
What does the result I get from model.evaluate() mean? I know it's the loss but, if I get a [7.0506157875061035, 0.0], does it mean that my model has a 7% error?
And last, I'm getting a prediction of 0.99875391, 0.99875391, 0.9362126, 0.99875391, 0.99875391, 0.99875391, 0.93571019 when the expected values were anything close to 7.86, 3.57, 8.93, 6.57, 11.7, 8.53, 9.06, which means it's a real bad prediction. Clearly there's a lot of things I'm doing wrong. Could you guys give me a few pointers?
I know it all depends on the type of data I'm using, but is there anything I shouldn't do at all? Or maybe something I should be doing?
1
There is never a ready answer for how many epochs is a good number. It varies wildly depending on the size of your data, your model, and what you want to achieve. Normally, small models require less epochs, bigger models require more. Yours seem small enough and 1000 epochs seems way too much.
It also depends on the learning rate, a parameter given to the optimizer that defines how long are the steps your model takes to update its weights. Bigger learning rates mean less epochs, but there is a chance that you simply never find a good point because you're adjusting weights beyond what is good. Smaller learning rates mean more epochs and better learning.
Normally, if the loss reaches a limit, you're approaching a point where training is not useful anymore. (Of course, there may be problems with the model too, there is really no simple answer for this one).
To detect overfitting, you need besides the training data (X and Y), another group with test data (say Xtest and Ytest, for instance).
Then you use it in model.fit(X,Y, validation_data=(Xtest,Ytest), ...)
Test data is not given for training, it's kept separate just to see if your model can predict good things from data it has never seen in training.
If the training loss goes down, but the validation loss doesn't, you're overfitting (roughly, your model is capable of memorizing the training data without really understanding it).
An underfit, on the contrary, happens when you never achieve the accuracy you expect (of course we always expect a 100% accuracy, no mistakes, but good models get around the 90's, some applicatoins go better 99%, some worse, again, it's very subjective).
2
model.evaluate() gives you the losses and the metrics you added in the compile method.
The loss value is something your model will always try to decrease during training. It roughly means how distant your model is from the exact values. There is no rule for what the loss value means, it could even be negative (but usually keras uses positive losses). The point is: it must decrease during training, that means your model is evolving.
The accuracy value means how many right predictions your model outputs compared to the true values (Y). It seems your accuracy is 0%, your model is getting everything wrong. (You can see that from the values you typed).
3
In your model, you used activation functions. These normalize the results so they don't get too big. This avoids overflowing problems, numeric errors propagating, etc.
It's very very usual to work with values within such bounds.
tanh - outputs values between -1 and 1
sigmoid - outputs values between 0 and 1
Well, if you used a sigmoid activation in the last layer, your model will never output 3 for instance. It tries, but the maximum value is 1.
What you should do is prepare your data (Y), so it's contained between 0 and 1. (This is the best to do in classification problems, often done with images too)
But if you actually want numerical values, then you should just remove the activation and let the output be free to reach higher values. (It all depends on what you want to achieve with your model)
Epoch is a single pass through the full training set. I my mind it seems a lot, but you'd have to check for overfitting and evaluate the predictions. There are many ways of checking and controlling for overfitting in a model. If you understand the methods of doing so from here, coding them in Keras should be no problem.
According to the documentation .evaluate returns:
Scalar test loss (if the model has no metrics) or list of scalars (if the model computes other metrics)
so these are the evaluation metrics of your model, they tell you how good your model is given some notion of good. Those metrics depend on the model and type of data that you've used. Some explanation on those can be found here and here. As mentioned in the documentation,
The attribute model.metrics_names will give you the display labels for the scalar outputs.
So you can know what metric you are looking at. It is easier to do that interactively through the console (ipython, bpython) or Jupyter notebook.
I can't see your data, but a if you are doing a classification problem as suggested by metrics=['accuracy'], the loss=mean_absolute_error doesn't make sense, since it is made for regression problems. To learn more about those I refer you to here and here which discuss classification and regression problems with Keras.
PS: question 3 is not related to software per se, but to the theoretical construct supporting the software. In such cases, I'd recommend asking them at Cross Validated.