Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a neural network set up in tensorflow (in python) that is operating on the fer2013 dataset (can be found on kaggle). My network architecture is this
emotion_model = Sequential()
emotion_model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(48,48,1)))
emotion_model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
emotion_model.add(MaxPooling2D(pool_size=(2, 2)))
emotion_model.add(Dropout(0.25))
emotion_model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
emotion_model.add(MaxPooling2D(pool_size=(2, 2)))
emotion_model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
emotion_model.add(MaxPooling2D(pool_size=(2, 2)))
emotion_model.add(Dropout(0.25))
emotion_model.add(Flatten())
emotion_model.add(Dense(1024, activation='relu'))
emotion_model.add(Dropout(0.5))
emotion_model.add(Dense(7, activation='softmax'))
emotion_model.compile(loss='categorical_crossentropy', optimizer=Adam(lr=0.0001, decay=1e-6), metrics=['accuracy'])
emotion_model_info = emotion_model.fit(
train_generator,
steps_per_epoch=28709 // 64,
epochs=50,
validation_data=validation_generator,
validation_steps=7178 // 64)
I plotted a learning curve for this algorithm and I got this:
Now, I am a beginner to machine learning but this divergence in Test vs. Validation data accuracy/cost would seem to point to overfitting of the data. However, I was looking at other people's accuracy levels on the same dataset and found that most people get around 62% accuracy on validation (which is what I have currently) and they usually get about the same for training accuracy. So I'm very surprised that my training data is performing so well (indicating overfitting) and yet my validation accuracy is on par with others implementations. My question is two-fold. First, is there anything wrong with my implementation that may be causing my model to perform so well on training but only average on val (and not really have any room for improvement) or is this just classic overfitting? If it is overfitting, I would appreciate some advice on how to counter this. My dataset is fixed for the most part (I guess I could try to add more data if I need to), I've tried adding some regularization and that harmed performance. Basically, I feel like I'm missing something here. It strikes me as suspicious that my training accuracy is so high and I wanted to check here to make sure I wasn't missing anything before I sink time into trying to correct the overfitting. Any help is appreciated.
You're quite correct: this is the very definition of over-fitting.
Validation and training losses diverge
Validation and training accuracies diverge
Validation loss later increases
In general, we also expect that the validation loss will reach a relative minimum at about the same point -- this defines the convergence point. Here, it seems that out of the many things the model learns in training, there are still a few useful learnings after the divergence point around epoch 8.
The next items to consider are
What makes you think that you need to train for 50 epochs? Is it possible that epoch 8 is a reasonable place to halt training?
Have you checked your data set to ensure that the training set is, indeed, properly representative of the entire data set?
Consider cross-validation to check the viability of your data splits
I don't know much about this specific dataset, but, generally speaking, a deep learning model will always converge to best fit the training data given it's strong enough (has enough neurons) to do so.
You can always use dropout layers in between your main layers, which are layers that randomly dropout some pixels in the input to the incoming layer. Just do:
emotion_model.add(tensorflow.keras.layers.Dropout(dropout_precentage))
You could also try using L1 and/or L2 norm, which just adds up the weights of a layer to the final lose, which means that the model can't give a certain feature a large weight, which reduces outfitting. just add a kernel_regularizer argument to layers that containt weights, like:
emotion_model.add(Conv2D(64, kernel_size=(3, 3), activation='relu',
kernel_regularizer = keras.regularizers.l1() ))
Related
I have been working with synthetically produced data which consists of samples of the shape 4x1745 and 2 labels each of which further can have 120 classes. The total number of combinations of possible classes comes out to 7140.
I have been successfully able to train Decision tree models on the data and was able to achieve a test accuracy of 20% and a train accuracy of 88%.
I have built a CNN model with the following layers
model = keras.Sequential()
model.add(Conv2D(16,kernel_size=(3,3), activation='elu'))
model.add(MaxPooling2D())
model.add(Conv2D(32,kernel_size=(3,3), activation='elu'))
model.add(MaxPooling2D())
model.add(Conv2D(64,kernel_size=(3,3), activation='elu'))
model.add(MaxPooling2D())
model.add(Flatten())
model.add(Dense(128,activation='elu'))
model.add(Dense(120,activation='softmax'))
I have compiled the model with adam optimizer with a learning of 0.0001 and categorical crossentropy as the loss function.
The problem I am facing is that the loss eventually explodes and keeps increasing exponentially with each epoch.
I have tried using different learning rates but they just delay the time before the loss explodes.
I changed the number of layers in the model, which didn't stop the loss from exploding.
I have even reshaped the samples into 119x60 thinking that maybe the CNN was unable to catch any patterns when the samples are so long, but it doesn't help.
I have also tried changing the activation functions and the batch sizes.
And finally I tried using an ANN as well which led to the same problem.
Any help is highly appreciated.
i'm working on a classification problem (human activity classification) and i used CNN the code of model is :
model = Sequential()
model.add(Conv2D(100, (2, 2), activation = 'relu', input_shape = X_train[0].shape))
model.add(Dropout(0.1))
#adding pooling layer
model.add(MaxPool2D(2,2))
model.add(Dense(64, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(64, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(7, activation='softmax'))
compiling and fiting :
model.compile(optimizer=Adam(learning_rate = 0.001), loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])
history = model.fit(X_train, y_train, epochs = 20, validation_data= (X_test, y_test), verbose=1)
the accuracy was like this
how coul'd i increase the last value of accuracy ? and why the curve is increasing kinda fast?
There are a few avenues you can pursue here, specifically finding answers to the following questions for your particular problem. Here's a great video, although not for tensorflow, but I think the question you are asking is general enough for it to apply
What is the right amount of time to train for? Likely the answer here is somewhere between 20 epochs and 90, more specifically, it's where your two series in the plot start to diverge; in other words, your model starts to memorize the training data at the point of divergence. Tensorflow has early stopping mechanisms to help with this.
What is the performance of a naïve guesser? Is the complexity of your model proportional to the complexity/dimensionality of the problem?
What is the human insight that you can bring to the problem? Are there things you can do to the features that will help the model create separability in higher dimensions? For example, let's say your model is going to predict what activity a person is going to do at a given point in time. In this case, information related to people might be separate from time and activity data. You can create features that represent combinations of other features (assuming you have a lot of data), and encode this and feed it to your model. You can create embeddings in your model to get your model to deal with the sparsity that occurs when you combine such categorical features.
Another aspect of this that I think is very important to answer is "Why am I solving this problem?". In some cases, the answer might be "I want to learn X", in which case you might approach it differently. For example, if it's all tabular data, you might have more interpretable/better results using something like scikit-learn using a tree based model. It also, of course, depends on the amount and type of data you have. Nested cross-validation can give you great insight into what are the combinations of hyperparameters and features that will produce a model that generalizes, and also about the variation you can expect to see on unseen data.
Best of luck!
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I am using deep learning with keras for multi-label text classfication. However, the accuracy i am getting is only between 73-75. I think i am misjudging one of the parameters here. Is there a way to improve this? (by the way, the number of rows i have is 50858)
Here is the code i am using for building and fitting the model:
filter_length = 64
num_classes = 39
model = Sequential()
model.add(Embedding(max_words, 39, input_length=maxlen))
model.add(Dropout(0.3))
model.add(Conv1D(filter_length, 3, padding='same', activation='relu', strides=1))
model.add(GlobalMaxPool1D())
model.add(Dense(num_classes))
model.add(Activation('sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()
callbacks = [
ReduceLROnPlateau(),
EarlyStopping(patience=4),
ModelCheckpoint(filepath='model-conv1d1.h5', save_best_only=True)
]
history = model.fit(x_train, y_train,
#class_weight=class_weight,
epochs=100,
batch_size=10,
validation_split=0.1,
callbacks=callbacks)
It's hard to give an answer without knowing the data and the results with different trials. What you have to do is tuning your hyperparameter, either automated or by hand.
Here are some experiments I would try:
Increase the dimensionality of the embedding layers (allows it to contain more information), or use pre-trained word embeddings such as GloVe
experiment with different parameters for the Conv1D layer
Change the Conv1D layer to a recurrent one (e.g. LSTM, GRU), which normally work well with sequences
Change the Global max pooling to max pooling
Increase the batch size
Add one extra layer to the network
Don't forget to keep track of your experiments: one great library for this is MLFlow. In this case, you might want to turn the model definition and compile into a function (e.g. def build_model(**kwargs): ) where your design decisions are controlled by the arguments. This can help you achieve a more readable and loggable code (in addition to working really well with automated hyperparameter tuning).
Finally, ensure your classes are balanced (and over/undersample otherwise) and randomly sorted when fed into the model. If not, consider if using e.g. AUC ROC instead of accuracy as a metric to track the model performance.
Here's my LSTM model to classify hand gesture. Initially, I had 1960 training data of shape(num_sequences, num_joints, 3) that I reshape to shape(num_sequences, num_joints*3).
Here's my model:
input_shape = (trainx.shape[1], trainx.shape[2])
print("Build LSTM RNN model ...")
model = Sequential()
model.add(Masking(mask_value=0., input_shape=(171, 66)))
model.add(Bidirectional(LSTM(units=256, activation='tanh', return_sequences=True, input_shape=input_shape)))
model.add(Dropout(0.5))
model.add(BatchNormalization())
model.add(Bidirectional(LSTM(units=128, activation='tanh', return_sequences=True)))
model.add(Dropout(0.5))
model.add(BatchNormalization())
model.add(Bidirectional(LSTM(units=128, activation='tanh', return_sequences=False)))
model.add(Dropout(0.5))
model.add(BatchNormalization())
model.add(Dense(units=trainy.shape[1], activation="softmax"))
print("Compiling ...")
# Keras optimizer defaults:
# Adam : lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-8, decay=0.
# RMSprop: lr=0.001, rho=0.9, epsilon=1e-8, decay=0.
# SGD : lr=0.01, momentum=0., decay=0.
opt = Adam()
model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])
I get a 90% accuracy on train and 50% on test
Overfitting is quite common in deep learning.
To circumvent over fitting with your LSTM architecture try the following things in this order:
Decrease the learning rate from 0.1 or 0.01 to 0.001,0.0001,0.00001.
Reduce the number of epochs. You can try to plot the training and validation accuracy as a function of the number of epochs and see when the training accuracy becomes larger than the validation accuracy. That is the number of epochs that you should use. Combine this with the 1st step decreasing the learning rate.
Then you can try to modify the architecture of the LSTM, here you already added dropout (maximum value 0.5), I would suggest to try 0.2, 0.3. You have 3 cells which is better than 2, the size of the nodes look reasonable. What is the embedding dimension you are currently using? Since you are overfitting it is worth a try to reduce the number of cells from 3 to 2 and keeping the same number of nodes.
The batch size might be important as well as the distribution of subclasses in your dataset. Is the dataset equally distributed and equally balanced between training and validation sets? What I mean by this is that if one hand gesture is over represented in the training set compared to the validation set that might be a problem. A good strategy to overcome this is to keep some part of the data as a test set. Then do a train/split cross validation using sklearn (5 times). Then train your architecture on each train/split model separately (5 times) and compare the training and validation accuracy. If there is a big bias in the split or among the sets you will be able to identify it in this manner.
Last, you can try augmentation, specifically rotation and horizontal/vertical flip. This library might help https://github.com/aleju/imgaug
Hope this helps!
How do you know the network is over fitting versus some kind of error in your data set. Does the validation loss improve initially up to some epoch then plateau or start to increase? Then it is over fitting. If it starts at 50% and stays there it is not an over fitting problem. With the amount of drop out you have over fitting does not look very likely. How did you select your validation set? Was it randomly selected from the overall data set or did you do the selection? It is always better to randomly select the data so that its probability distribution mirrors that of the training data. As said in the comments please show your code for model.fit there could be a problem there. How do you input the data? Did you use generators? A 50% validation accuracy leads me to suspect some error in how your validation data is provided to the network or some error in labeling of the validation data. I would also recommend you consider the use of dynamically adjusting your learning rate based on monitoring of validation loss. Keras has a callback for this
called ReduceLROnPlateau. Documentation is here. Set it up to monitor validation loss. I set the parameters patience=3 and factor=.5 which seems to work well. You can think of training as descending into a valley. As you descend the valley gets narrower. If the learning rate is to large and remains fixed you won't be able to reach further down toward the minimum. This should improve your training accuracy which should result in improvement of validation accuracy. As I said with the level of drop out you have I do not think it is over fitting but if it is you can also use Keras regularizes to help avoid over training. Documentation is here.
I am attempting to use keras to build an activity classifier from accelerometer signals. However, I am experiencing extreme overfitting of the data even with the most simplistic of models.
The input data is of shape (10,3) and contains roughly .1 second of data from the accelerometer in 3 dimensions. The model is simply
model = Sequential()
model.add(Flatten(input_shape=(10,3)))
model.add(Dense(2, activation='softmax'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
The model should output the label [1,0] for walking activities and [0,1] for non-walking activities. After training I get 99.8% accuracy (if only it was real...). When I attempt to predict on data that wasn't used for training, I get 50% accuracy, verifying that the net isn't really "learning" anything except to predict a single class value.
The data is being prepared from 100hz triaxial accelerometer signals. I am not preprocessing the data in any way except for windowing it into bins on length 10 that overlap with the previous bin by 50%. What measures can I take to make the network produce actual predictions? I have tried increasing the window size but the results remain the same. Any advice/general tips are greatly appreciated.
Ian
Try adding some hidden layers and dropout layers to your network. You could create a simple Multi Layer Perceptron (MLP) with a couple of extra lines in between your Flatten layer and Dense layer:
model.add(Dense(64, activation='relu', input_dim=30))
model.add(Dropout(0.25))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.1))
Or check out this guide. which explains how to create a simple MLP.
Without any hidden layers your model will not actually be 'learning' from the input data, rather it will be mapping the input number of features to the output number of features.
The more layers you add, the more intermediate features and patterns it should extract from the input data which should lead to better model predictions for test data. There will be a lot of trial and error to design the best model as too many layers can result in over fitting.
You have not provided information about how you train the model so that may be the cause of the issue as well. You must ensure that the data is spit into training, testing and validation sets. Some possible split ratios to use for training, validation, test data are: 60%:20%:20%, or 70%:15%:15%. This is ultimately something that you must also decide.
The problem of overfitting was caused by the input data type. The values passed to the classifier should have been float values with 2 decimal places. Somewhere along the way, some of these values had been augmented and had significantly more than 2 decimal places. That is, the input should have looked like
[9.81, 10.22, 11.3]
but instead looked like
[9.81000000012, 10.220010431, 11.3000000101]
The classifier was making its prediction based on this feature, which is obviously not the desired behavior! Lessoned learned - make sure the data preparation is consistent! Thanks to #umutto for the suggestions of random forests, the simple structure was helpful for diagnosing purposes.