I built an algorithm in Python for data sets classification with Keras. It's a very simple LSTM network with 1 input layer, 1 hidden layer (LSTM) and 1 dense output layer.
My data consists of some analog measurements: 63 sets for training and 36 sets for testing, each set having 3 channels with 19200 samples each channel, so (following what I understood reading the documentation) the input shape I needed was x = (63,19200,3) and y = (36,19200,3). (If you want some additional information about the type of data, I can explain more.)
My code is as follows:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.utils import shuffle
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Input
from keras.layers import Dropout
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
from keras import initializers
from keras import optimizers
# Fix random seed for reproducibility.
np.random.seed(1)
# Loading data (shapes: X_test (36,19200,3), y_test (36,3), X_train (63,19200,3), y_train (63,3))
(X_test, y_test), (X_train, y_train) = np.load('path.npy',allow_pickle=True)
data = [(X_test, y_test), (X_train, y_train)]
# Manually separating the validation data.
x_val = X_train[-10:]
y_val = y_train[-10:]
X_train = X_train[:-10]
y_train = y_train[:-10]
# Creating model.
model = Sequential()
model.add(Input(shape=(19200,3)))
model.add(LSTM(50, name = 'LSTM', activation='tanh',recurrent_activation='tanh', kernel_initializer=initializers.RandomNormal(mean=0.0, stddev=0.05, seed=1), bias_initializer=initializers.zeros()))
model.add(Dense(1, name = 'Saida', activation='sigmoid', kernel_initializer=initializers.RandomNormal(mean=0.0, stddev=0.05, seed=1), bias_initializer=initializers.zeros()))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
history = model.fit(X_train, y_train, epochs=20, batch_size=12, shuffle=True, validation_data=(x_val, y_val))
# Final evaluation of the model.
scores = model.evaluate(X_test, y_test, verbose=1)
print("Accuracy: %.2f%%" % (scores[1]*100))
Very simple, but not that organized, still working on that.
And for this run, the results are:
Model: "sequential_8"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
LSTM (LSTM) (None, 50) 10800
_________________________________________________________________
Saida (Dense) (None, 1) 51
=================================================================
Total params: 10,851
Trainable params: 10,851
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/20
5/5 [==============================] - 17s 3s/step - loss: 0.6866 - accuracy: 0.6792 - val_loss: 0.6956 - val_accuracy: 0.0000e+00
Epoch 2/20
5/5 [==============================] - 20s 4s/step - loss: 0.6814 - accuracy: 0.8113 - val_loss: 0.6979 - val_accuracy: 0.0000e+00
Epoch 3/20
5/5 [==============================] - 21s 4s/step - loss: 0.6915 - accuracy: 0.7925 - val_loss: 0.7002 - val_accuracy: 0.0000e+00
Epoch 4/20
5/5 [==============================] - 24s 5s/step - loss: 0.6810 - accuracy: 0.7925 - val_loss: 0.7025 - val_accuracy: 0.0000e+00
Epoch 5/20
5/5 [==============================] - 25s 5s/step - loss: 0.6828 - accuracy: 0.7925 - val_loss: 0.7048 - val_accuracy: 0.0000e+00
Epoch 6/20
5/5 [==============================] - 24s 5s/step - loss: 0.6703 - accuracy: 0.8302 - val_loss: 0.7072 - val_accuracy: 0.0000e+00
Epoch 7/20
5/5 [==============================] - 24s 5s/step - loss: 0.6787 - accuracy: 0.7925 - val_loss: 0.7095 - val_accuracy: 0.0000e+00
Epoch 8/20
5/5 [==============================] - 26s 5s/step - loss: 0.6963 - accuracy: 0.7547 - val_loss: 0.7117 - val_accuracy: 0.0000e+00
Epoch 9/20
5/5 [==============================] - 25s 5s/step - loss: 0.6776 - accuracy: 0.7925 - val_loss: 0.7141 - val_accuracy: 0.0000e+00
Epoch 10/20
5/5 [==============================] - 25s 5s/step - loss: 0.6640 - accuracy: 0.8302 - val_loss: 0.7164 - val_accuracy: 0.0000e+00
Epoch 11/20
5/5 [==============================] - 24s 5s/step - loss: 0.6626 - accuracy: 0.8491 - val_loss: 0.7187 - val_accuracy: 0.0000e+00
Epoch 12/20
5/5 [==============================] - 24s 5s/step - loss: 0.6504 - accuracy: 0.8491 - val_loss: 0.7210 - val_accuracy: 0.0000e+00
Epoch 13/20
5/5 [==============================] - 24s 5s/step - loss: 0.6729 - accuracy: 0.7925 - val_loss: 0.7233 - val_accuracy: 0.0000e+00
Epoch 14/20
5/5 [==============================] - 24s 5s/step - loss: 0.6602 - accuracy: 0.8302 - val_loss: 0.7257 - val_accuracy: 0.0000e+00
Epoch 15/20
5/5 [==============================] - 25s 5s/step - loss: 0.6857 - accuracy: 0.7547 - val_loss: 0.7281 - val_accuracy: 0.0000e+00
Epoch 16/20
5/5 [==============================] - 23s 5s/step - loss: 0.6630 - accuracy: 0.8113 - val_loss: 0.7305 - val_accuracy: 0.0000e+00
Epoch 17/20
5/5 [==============================] - 25s 5s/step - loss: 0.6633 - accuracy: 0.7925 - val_loss: 0.7328 - val_accuracy: 0.0000e+00
Epoch 18/20
5/5 [==============================] - 24s 5s/step - loss: 0.6600 - accuracy: 0.8302 - val_loss: 0.7352 - val_accuracy: 0.0000e+00
Epoch 19/20
5/5 [==============================] - 25s 5s/step - loss: 0.6670 - accuracy: 0.8113 - val_loss: 0.7374 - val_accuracy: 0.0000e+00
Epoch 20/20
5/5 [==============================] - 24s 5s/step - loss: 0.6534 - accuracy: 0.8302 - val_loss: 0.7399 - val_accuracy: 0.0000e+00
2/2 [==============================] - 1s 314ms/step - loss: 0.7171 - accuracy: 0.4167
Accuracy: 41.67%
Summarizing: High loss, but decrease very slowly. Accuracy is varying, but in the end it stabilizes at the same value (usually 0,7925 or 0,8113). And my accuracy for the validation set don't even respond to any changes that occur with the other metrics.
My main concern is the validation data is not behaving as it should. I already tried changing the optimizers, activation functions of every layer, weight initializers, number of epochs (went till 100 several times but nothing changed), batch size, shuffling the data using Keras function and Python built-in method, and so on.
The only thing I did not tried was to change the input shapes, but, as I mentioned earlier, this was the only way I got the 3D array to be accepted in the Input Layer.
If you guys have any tips to what can be changed to achieve more consistent results, I would be very grateful.
Any additional commentary will be happily accepted.
This is my first question here and I am not a native english speaker, so sorry if any information was not very clear.
Cheers, Matheus Zimmermann.
I think you can apply to_categorical method or One hot encoding approach to the
y_train , y_val and y_test variables.
Hope than after applying it ,you will find your validation accuracy perfectly.
I faced this type of same problem before.
Related
I am currently training a model using the Cars196 dataset from Stanford. However, with the dataset correctly imported and recognized by TensorFlow, my accuracy is still 0. I used a similar approach to train the model on other datasets and it works. Did I do anything wrong?
Here is my code
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import csv
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Flatten,Dense
car_dir = './src/'
test_dir = './src/cars_test/'
train_dir = './src/cars_train/'
train_labels_file = './src/labels-train.csv'
test_labels_file = './src/labels-test.csv'
IMG_SIZE = (150,150)
def read_labels(label_file:str):
pathAndClass = list()
with open(label_file) as csv_file:
reader = csv.reader(csv_file)
next(reader) # skip first row
for row in reader:
pathAndClass.append([row[5].lower(), row[4]])
return pd.DataFrame(pathAndClass,columns=['path', 'class'])
pathAndClass = read_labels(train_labels_file)
n_classes = np.size(np.unique(pathAndClass['class']))
pathAndClass['path'] = pathAndClass['path'].astype(str)
pathAndClass['class'] = pathAndClass['class'].astype(str)
data_gen = ImageDataGenerator(rescale = 1.0/255.0, validation_split=0.25)
BATCH_SIZE = 32
index_list = []
for i in range(0, n_classes):
index_list.append(str(i))
train_flow = data_gen.flow_from_dataframe(
dataframe=pathAndClass,
x_col='path',
y_col='class',
directory=train_dir,
subset="training",
seed=42,
target_size=IMG_SIZE,
batch_size=BATCH_SIZE,
shuffle=True,
classes=index_list,
class_mode='categorical')
valid_flow = data_gen.flow_from_dataframe(
dataframe=pathAndClass,
x_col='path',
y_col='class',
directory=train_dir,
subset="validation",
seed=42,
target_size=IMG_SIZE,
batch_size=BATCH_SIZE,
shuffle=True,
classes=index_list,
class_mode='categorical')
model_nn = Sequential()
model_nn.add(Flatten(input_shape=(150,150, 3)))
model_nn.add(Dense(300, activation="relu"))
model_nn.add(Dense(n_classes, activation="softmax"))
model_nn.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model_nn.summary())
training = model_nn.fit(
train_flow,
steps_per_epoch=train_flow.n//train_flow.batch_size,
epochs=10,
validation_data=valid_flow,
validation_steps=valid_flow.n//valid_flow.batch_size)
print(model_nn.evaluate(train_flow))
plt.plot(training.history['accuracy'])
plt.plot(training.history['val_accuracy'])
plt.plot(training.history['loss'])
plt.plot(training.history['val_loss'])
plt.title('Model accuracy/loss')
plt.ylabel('accuracy/loss')
plt.xlabel('epoch')
plt.legend(['accuracy', 'val_accuracy', 'loss', 'val_loss'])
plt.show()
The output I got
Found 6078 validated image filenames belonging to 196 classes.
Found 2026 validated image filenames belonging to 196 classes.
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten_1 (Flatten) (None, 67500) 0
_________________________________________________________________
dense_2 (Dense) (None, 300) 20250300
_________________________________________________________________
dense_3 (Dense) (None, 196) 58996
=================================================================
Total params: 20,309,296
Trainable params: 20,309,296
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/10
189/189 [==============================] - 68s 361ms/step - loss: 9.6809 - accuracy: 0.0036 - val_loss: 5.2785 - val_accuracy: 0.0030
Epoch 2/10
189/189 [==============================] - 58s 307ms/step - loss: 5.2770 - accuracy: 0.0055 - val_loss: 5.2785 - val_accuracy: 0.0089
Epoch 3/10
189/189 [==============================] - 58s 307ms/step - loss: 5.2743 - accuracy: 0.0083 - val_loss: 5.2793 - val_accuracy: 0.0104
Epoch 4/10
189/189 [==============================] - 58s 306ms/step - loss: 5.2728 - accuracy: 0.0089 - val_loss: 5.2800 - val_accuracy: 0.0089
Epoch 5/10
189/189 [==============================] - 58s 307ms/step - loss: 5.2710 - accuracy: 0.0084 - val_loss: 5.2806 - val_accuracy: 0.0089
Epoch 6/10
189/189 [==============================] - 57s 305ms/step - loss: 5.2698 - accuracy: 0.0086 - val_loss: 5.2815 - val_accuracy: 0.0089
Epoch 7/10
189/189 [==============================] - 58s 307ms/step - loss: 5.2695 - accuracy: 0.0083 - val_loss: 5.2822 - val_accuracy: 0.0089
Epoch 8/10
189/189 [==============================] - 58s 310ms/step - loss: 5.2681 - accuracy: 0.0086 - val_loss: 5.2834 - val_accuracy: 0.0089
Epoch 9/10
189/189 [==============================] - 58s 306ms/step - loss: 5.2679 - accuracy: 0.0083 - val_loss: 5.2840 - val_accuracy: 0.0089
Epoch 10/10
189/189 [==============================] - 58s 308ms/step - loss: 5.2669 - accuracy: 0.0083 - val_loss: 5.2848 - val_accuracy: 0.0089
1578/Unknown - 339s 215ms/step - loss: 5.2657 - accuracy: 0.0085
Update 1
I have increased the training sample by decreasing the batch size to 8. I tried to train the model again. However, the accuracy is still nearly 0.
Epoch 1/10
759/759 [==============================] - 112s 147ms/step - loss: 7.6876 - accuracy: 0.0051 - val_loss: 5.2779 - val_accuracy: 0.0089
Epoch 2/10
759/759 [==============================] - 112s 148ms/step - loss: 5.2728 - accuracy: 0.0086 - val_loss: 5.2792 - val_accuracy: 0.0089
Epoch 3/10
759/759 [==============================] - 112s 148ms/step - loss: 5.2695 - accuracy: 0.0087 - val_loss: 5.2808 - val_accuracy: 0.0089
Epoch 4/10
759/759 [==============================] - 109s 143ms/step - loss: 5.2671 - accuracy: 0.0087 - val_loss: 5.2828 - val_accuracy: 0.0089
Epoch 5/10
759/759 [==============================] - 111s 146ms/step - loss: 5.2661 - accuracy: 0.0086 - val_loss: 5.2844 - val_accuracy: 0.0089
Epoch 6/10
759/759 [==============================] - 114s 151ms/step - loss: 5.2648 - accuracy: 0.0089 - val_loss: 5.2862 - val_accuracy: 0.0089
Epoch 7/10
759/759 [==============================] - 118s 156ms/step - loss: 5.2646 - accuracy: 0.0086 - val_loss: 5.2881 - val_accuracy: 0.0089
Epoch 8/10
759/759 [==============================] - 117s 155ms/step - loss: 5.2639 - accuracy: 0.0087 - val_loss: 5.2891 - val_accuracy: 0.0089
Epoch 9/10
759/759 [==============================] - 115s 151ms/step - loss: 5.2635 - accuracy: 0.0087 - val_loss: 5.2903 - val_accuracy: 0.0089
Epoch 10/10
759/759 [==============================] - 112s 147ms/step - loss: 5.2634 - accuracy: 0.0086 - val_loss: 5.2915 - val_accuracy: 0.0089
2390/Unknown - 141s 59ms/step - loss: 5.2611 - accuracy: 0.0088
Indeed the last dataset I used had less classes but more samples. Maybe there is another model that fits my dataset, any suggestions?
For computer vision problems, you want to look at Convolutional Neural Networks. If you're unfamiliar with them, they learn to identify features in images. Examples could be edges and textures in early layers, and then wheels, windows, doors, etc in later layers.
For this problem, I would suggest using an existing, pretrained network such as MobileNet V2 or InceptionNetV3 as a backbone, and then building your own classifier on top. This tutorial on the Tensorflow website will get you started https://www.tensorflow.org/tutorials/images/transfer_learning#create_the_base_model_from_the_pre-trained_convnets
Here's an excerpt from this tutorial:
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet')
Then adding your model code from above, you could try:
model = tf.keras.Sequential([
base_model,
Flatten(input_shape=(150,150, 3)),
Dense(300, activation="relu")),
Dense(n_classes, activation='softmax')
])
model_nn.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
This is the model I've used on similar datasets and got reasonable accuracy with it:
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet')
model = tf.keras.Sequential([
base_model,
GlobalAveragePooling2D(),,
Dense(n_classes, activation='softmax')
])
In you current model, you are not trying to extract any features from the images. A single hidden layer with 300 neurons is nowhere near enough to be able to learn the features in images and give meaningful results.
You also need to check your input image size. MobileNet V2 works well with 224x224 colour images.
As per other comments, you will need to use the full dataset, you are not going to get any meaningful results with a few hundred images.
I would suggest that the ~6000 Training Samples for almost 200 classes is simply way to little for the model to work well.
The model did ~2000 wheight updates (200 in each epoch), which is way to few for it to learn to distinguish between ~200 classifications.
Maybe you had less classes and more training data in the other training sets?
I am writing a neural network for translating texts from Russian to English, but I ran into the problem that my neural network gives a big loss, as well as a very far from the correct answer.
Below is the LSTM that I build using Keras:
def make_model(in_vocab, out_vocab, in_timesteps, out_timesteps, n):
model = Sequential()
model.add(Embedding(in_vocab, n, input_length=in_timesteps, mask_zero=True))
model.add(LSTM(n))
model.add(Dropout(0.3))
model.add(RepeatVector(out_timesteps))
model.add(LSTM(n, return_sequences=True))
model.add(Dropout(0.3))
model.add(Dense(out_vocab, activation='softmax'))
model.compile(optimizer=optimizers.RMSprop(lr=0.001), loss='sparse_categorical_crossentropy')
return model
And also the learning process is presented:
Epoch 1/10
3/3 [==============================] - 5s 1s/step - loss: 8.3635 - accuracy: 0.0197 - val_loss: 8.0575 - val_accuracy: 0.0563
Epoch 2/10
3/3 [==============================] - 2s 806ms/step - loss: 7.9505 - accuracy: 0.0334 - val_loss: 8.2927 - val_accuracy: 0.0743
Epoch 3/10
3/3 [==============================] - 2s 812ms/step - loss: 7.7977 - accuracy: 0.0349 - val_loss: 8.2959 - val_accuracy: 0.0571
Epoch 4/10
3/3 [==============================] - 3s 825ms/step - loss: 7.6700 - accuracy: 0.0389 - val_loss: 8.5628 - val_accuracy: 0.0751
Epoch 5/10
3/3 [==============================] - 3s 829ms/step - loss: 7.5595 - accuracy: 0.0411 - val_loss: 8.5854 - val_accuracy: 0.0743
Epoch 6/10
3/3 [==============================] - 3s 807ms/step - loss: 7.4604 - accuracy: 0.0406 - val_loss: 8.7633 - val_accuracy: 0.0743
Epoch 7/10
3/3 [==============================] - 2s 815ms/step - loss: 7.3475 - accuracy: 0.0436 - val_loss: 8.9103 - val_accuracy: 0.0743
Epoch 8/10
3/3 [==============================] - 3s 825ms/step - loss: 7.2548 - accuracy: 0.0455 - val_loss: 9.0493 - val_accuracy: 0.0721
Epoch 9/10
3/3 [==============================] - 2s 814ms/step - loss: 7.1751 - accuracy: 0.0449 - val_loss: 9.0740 - val_accuracy: 0.0788
Epoch 10/10
3/3 [==============================] - 3s 831ms/step - loss: 7.1132 - accuracy: 0.0479 - val_loss: 9.2443 - val_accuracy: 0.0773
And the parameters that I transmit for training:
model = make_model(# the size of tokenized words
russian_vocab_size,
english_vocab_size,
# maximum sentence lengths
max_russian_sequence_length,
max_english_sequence_length,
512)
model.fit(preproc_russian_sentences, # all tokenized Russian offers that are transmitted in the format shape (X, Y)
preproc_english_sentences, # all tokenized English offers that are transmitted in the format shape (X, Y, 1)
epochs=10,
batch_size=1024,
validation_split=0.2,
callbacks=None,
verbose=1)
Thank you in advance.
I am using tensorflow and keras to classify build a classification model. When running the code below it seems that the output does not seem to converge after each epoch, with the loss steadily increasing and the accuracy contantly set to 0.0000e+00. I am new to machine learning and am not too sure why this is happening.
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.models import Sequential
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
import numpy as np
import time
import tensorflow as tf
from google.colab import drive
drive.mount('/content/drive')
import pandas as pd
data = pd.read_csv("hmnist_28_28_RGB.csv")
X = data.iloc[:, 0:-1]
y = data.iloc[:, -1]
X = X / 255.0
X = X.values.reshape(-1,28,28,3)
print(X.shape)
model = Sequential()
model.add(Conv2D(256, (3, 3), input_shape=X.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(256, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten()) # this converts our 3D feature maps to 1D feature vectors
model.add(Dense(64))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(X, y, batch_size=32, epochs=10, validation_split=0.3)
Output
(378, 28, 28, 3)
Epoch 1/10
9/9 [==============================] - 4s 429ms/step - loss: -34.6735 - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 2/10
9/9 [==============================] - 4s 400ms/step - loss: -1074.2162 - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 3/10
9/9 [==============================] - 4s 399ms/step - loss: -7446.1872 - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 4/10
9/9 [==============================] - 4s 396ms/step - loss: -30012.9553 - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 5/10
9/9 [==============================] - 4s 406ms/step - loss: -89006.4180 - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 6/10
9/9 [==============================] - 4s 400ms/step - loss: -221087.9078 - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 7/10
9/9 [==============================] - 4s 399ms/step - loss: -480032.9313 - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 8/10
9/9 [==============================] - 4s 403ms/step - loss: -956052.3375 - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 9/10
9/9 [==============================] - 4s 396ms/step - loss: -1733128.9000 - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 10/10
9/9 [==============================] - 4s 401ms/step - loss: -2953626.5750 - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
You need to make several changes to your model to make it work.
There are 7 different labels in the dataset, so your last layer needs 7 output neurons.
For your last layer you are currently using sigmoid activation. This is not suitable for multi-class classification. Instead you should use the softmax activation.
As loss function you are using loss='binary_crossentropy'. This is only to be used for binary classification. In your case, since your labels consist of integers loss='sparse_categorical_crossentropy' should be used. You can find more information here.
With the following changes to the last lines of your code:
model.add(Dense(7))
model.add(Activation('softmax'))
model.compile(loss='sparse_categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(X, y, batch_size=32, epochs=10, validation_split=0.3)
You'll get this training history:
(10015, 28, 28, 3)
Epoch 1/10
220/220 [==============================] - 89s 403ms/step - loss: 1.0345 - accuracy: 0.6193 - val_loss: 1.7980 - val_accuracy: 0.4353
Epoch 2/10
220/220 [==============================] - 88s 398ms/step - loss: 0.8282 - accuracy: 0.6851 - val_loss: 3.3646 - val_accuracy: 0.0676
Epoch 3/10
220/220 [==============================] - 88s 399ms/step - loss: 0.6944 - accuracy: 0.7502 - val_loss: 2.9686 - val_accuracy: 0.1228
Epoch 4/10
220/220 [==============================] - 87s 395ms/step - loss: 0.6630 - accuracy: 0.7611 - val_loss: 3.3777 - val_accuracy: 0.0646
Epoch 5/10
220/220 [==============================] - 87s 396ms/step - loss: 0.5976 - accuracy: 0.7812 - val_loss: 2.3929 - val_accuracy: 0.2532
Epoch 6/10
220/220 [==============================] - 87s 396ms/step - loss: 0.5577 - accuracy: 0.7935 - val_loss: 2.9879 - val_accuracy: 0.2592
Epoch 7/10
220/220 [==============================] - 88s 398ms/step - loss: 0.7644 - accuracy: 0.7215 - val_loss: 2.5258 - val_accuracy: 0.2852
Epoch 8/10
220/220 [==============================] - 87s 395ms/step - loss: 0.5629 - accuracy: 0.7879 - val_loss: 2.6053 - val_accuracy: 0.3055
Epoch 9/10
220/220 [==============================] - 89s 404ms/step - loss: 0.5380 - accuracy: 0.8008 - val_loss: 2.7401 - val_accuracy: 0.1694
Epoch 10/10
220/220 [==============================] - 92s 419ms/step - loss: 0.5296 - accuracy: 0.8065 - val_loss: 3.7208 - val_accuracy: 0.0529
The model still needs to be optimized to achieve better results, but in general it works.
I was using this file for the training.
I am trying to train my model using transfer learning, for this I am using VGG16 model, stripped the top layers and froze first 2 layers for using imagenet initial weights. For fine tuning them I am using learning rate 0.0001, activation softmax, dropout 0.5, loss categorical crossentropy, optimizer SGD, classes 46.
I am just unable to understand the behavior while training. Train loss and acc both are fine (loss is decreasing, acc is increasing). Val loss is decreasing and acc is increasing as well, BUT they are always higher than the train loss and acc.
Assuming its overfitting I made the model less complex, increased the dropout rate, added more samples to val data, but nothing seemed to work. I am a newbie so any kind of help is appreciated.
26137/26137 [==============================] - 7446s 285ms/step - loss: 1.1200 - accuracy: 0.3810 - val_loss: 3.1219 - val_accuracy: 0.4467
Epoch 2/50
26137/26137 [==============================] - 7435s 284ms/step - loss: 0.9944 - accuracy: 0.4353 - val_loss: 2.9348 - val_accuracy: 0.4694
Epoch 3/50
26137/26137 [==============================] - 7532s 288ms/step - loss: 0.9561 - accuracy: 0.4530 - val_loss: 1.6025 - val_accuracy: 0.4780
Epoch 4/50
26137/26137 [==============================] - 7436s 284ms/step - loss: 0.9343 - accuracy: 0.4631 - val_loss: 1.3032 - val_accuracy: 0.4860
Epoch 5/50
26137/26137 [==============================] - 7358s 282ms/step - loss: 0.9185 - accuracy: 0.4703 - val_loss: 1.4461 - val_accuracy: 0.4847
Epoch 6/50
26137/26137 [==============================] - 7396s 283ms/step - loss: 0.9083 - accuracy: 0.4748 - val_loss: 1.4093 - val_accuracy: 0.4908
Epoch 7/50
26137/26137 [==============================] - 7424s 284ms/step - loss: 0.8993 - accuracy: 0.4789 - val_loss: 1.4617 - val_accuracy: 0.4939
Epoch 8/50
26137/26137 [==============================] - 7433s 284ms/step - loss: 0.8925 - accuracy: 0.4822 - val_loss: 1.4257 - val_accuracy: 0.4978
Epoch 9/50
26137/26137 [==============================] - 7445s 285ms/step - loss: 0.8868 - accuracy: 0.4851 - val_loss: 1.5568 - val_accuracy: 0.4953
Epoch 10/50
26137/26137 [==============================] - 7387s 283ms/step - loss: 0.8816 - accuracy: 0.4874 - val_loss: 1.4534 - val_accuracy: 0.4970
Epoch 11/50
26137/26137 [==============================] - 7374s 282ms/step - loss: 0.8779 - accuracy: 0.4894 - val_loss: 1.4605 - val_accuracy: 0.4912
Epoch 12/50
26137/26137 [==============================] - 7411s 284ms/step - loss: 0.8733 - accuracy: 0.4915 - val_loss: 1.4694 - val_accuracy: 0.5030
Yes, you are facing over-fitting issue. To mitigate, you can try to implement below steps
1.Shuffle the Data, by using shuffle=True in VGG16_model.fit. Code is shown below:
history = VGG16_model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1,
validation_data=(x_validation, y_validation), shuffle = True)
2.Use Early Stopping. Code is shown below
callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=15)
3.Use Regularization. Code for Regularization is shown below (You can try l1 Regularization or l1_l2 Regularization as well):
from tensorflow.keras.regularizers import l2
Regularizer = l2(0.001)
VGG16_model.add(Conv2D(96,11, 11, input_shape = (227,227,3),strides=(4,4), padding='valid', activation='relu', data_format='channels_last',
activity_regularizer=Regularizer, kernel_regularizer=Regularizer))
VGG16_model.add(Dense(units = 2, activation = 'sigmoid',
activity_regularizer=Regularizer, kernel_regularizer=Regularizer))
4.You can try using BatchNormalization.
5.Perform Image Data Augmentation using ImageDataGenerator. Refer this link for more info about that.
6.If the Pixels are not Normalized, Dividing the Pixel Values with 255 also helps
I have created the following toy dataset:
I am trying to predict the class with a neural net in keras:
model = Sequential()
model.add(Dense(units=2, activation='sigmoid', input_shape= (nr_feats,)))
model.add(Dense(units=nr_classes, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
With nr_feats and nr_classes set to 2.
The neural net can only predict with 50 percent accuracy returning either all 1's or all 2's. Using Logistic Regression results in 100 percent accuracy.
I can not find what is going wrong here.
I have uploaded a notebook to github if you quickly want to try something.
EDIT 1
I drastically increased the number of epochs and accuracy finally starts to improve from 0.5 at epoch 72 and converges to 1.0 at epoch 98.
This still seems extremely slow for such a simple dataset.
I am aware it is better to use a single output neuron with sigmoid activation but it's more that I want to understand why it does not work with two output neurons and softmax activation.
I pre-process my dataframe as follows:
from sklearn.preprocessing import LabelEncoder
x_train = df_train.iloc[:,0:-1].values
y_train = df_train.iloc[:, -1]
nr_feats = x_train.shape[1]
nr_classes = y_train.nunique()
label_enc = LabelEncoder()
label_enc.fit(y_train)
y_train = keras.utils.to_categorical(label_enc.transform(y_train), nr_classes)
Training and evaluation:
model.fit(x_train, y_train, epochs=500, batch_size=32, verbose=True)
accuracy_score(model.predict_classes(x_train), df_train.iloc[:, -1].values)
EDIT 2
After changing the output layer to a single neuron with sigmoid activation and using binary_crossentropy loss as modesitt suggested, accuracy still remains at 0.5 for 200 epochs and converges to 1.0 100 epochs later.
Note: Read the "Update" section at the end of my answer if you want the true reason. In this scenario, the other two reasons I have mentioned are only valid when the learning rate is set to a low value (less than 1e-3).
I put together some code. It is very similar to yours but I just cleaned it a little bit and made it simpler for myself. As you can see, I use a dense layer with one unit with a sigmoid activation function for the last layer and just change the optimizer from adam to rmsprop (it is not important that much, you can use adam if you like):
import numpy as np
import random
# generate random data with two features
n_samples = 200
n_feats = 2
cls0 = np.random.uniform(low=0.2, high=0.4, size=(n_samples,n_feats))
cls1 = np.random.uniform(low=0.5, high=0.7, size=(n_samples,n_feats))
x_train = np.concatenate((cls0, cls1))
y_train = np.concatenate((np.zeros((n_samples,)), np.ones((n_samples,))))
# shuffle data because all negatives (i.e. class "0") are first
# and then all positives (i.e. class "1")
indices = np.arange(x_train.shape[0])
np.random.shuffle(indices)
x_train = x_train[indices]
y_train = y_train[indices]
from keras.models import Sequential
from keras.layers import Dense
model = Sequential()
model.add(Dense(2, activation='sigmoid', input_shape=(n_feats,)))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
model.summary()
model.fit(x_train, y_train, epochs=5, batch_size=32, verbose=True)
Here is the output:
Layer (type) Output Shape Param #
=================================================================
dense_25 (Dense) (None, 2) 6
_________________________________________________________________
dense_26 (Dense) (None, 1) 3
=================================================================
Total params: 9
Trainable params: 9
Non-trainable params: 0
_________________________________________________________________
Epoch 1/5
400/400 [==============================] - 0s 966us/step - loss: 0.7013 - acc: 0.5000
Epoch 2/5
400/400 [==============================] - 0s 143us/step - loss: 0.6998 - acc: 0.5000
Epoch 3/5
400/400 [==============================] - 0s 137us/step - loss: 0.6986 - acc: 0.5000
Epoch 4/5
400/400 [==============================] - 0s 149us/step - loss: 0.6975 - acc: 0.5000
Epoch 5/5
400/400 [==============================] - 0s 132us/step - loss: 0.6966 - acc: 0.5000
As you can see the accuracy never increases from 50%. What if you increase the number of epochs to say 50:
Layer (type) Output Shape Param #
=================================================================
dense_35 (Dense) (None, 2) 6
_________________________________________________________________
dense_36 (Dense) (None, 1) 3
=================================================================
Total params: 9
Trainable params: 9
Non-trainable params: 0
_________________________________________________________________
Epoch 1/50
400/400 [==============================] - 0s 1ms/step - loss: 0.6925 - acc: 0.5000
Epoch 2/50
400/400 [==============================] - 0s 136us/step - loss: 0.6902 - acc: 0.5000
Epoch 3/50
400/400 [==============================] - 0s 133us/step - loss: 0.6884 - acc: 0.5000
Epoch 4/50
400/400 [==============================] - 0s 160us/step - loss: 0.6866 - acc: 0.5000
Epoch 5/50
400/400 [==============================] - 0s 140us/step - loss: 0.6848 - acc: 0.5000
Epoch 6/50
400/400 [==============================] - 0s 168us/step - loss: 0.6832 - acc: 0.5000
Epoch 7/50
400/400 [==============================] - 0s 154us/step - loss: 0.6817 - acc: 0.5000
Epoch 8/50
400/400 [==============================] - 0s 146us/step - loss: 0.6802 - acc: 0.5000
Epoch 9/50
400/400 [==============================] - 0s 161us/step - loss: 0.6789 - acc: 0.5000
Epoch 10/50
400/400 [==============================] - 0s 140us/step - loss: 0.6778 - acc: 0.5000
Epoch 11/50
400/400 [==============================] - 0s 177us/step - loss: 0.6766 - acc: 0.5000
Epoch 12/50
400/400 [==============================] - 0s 180us/step - loss: 0.6755 - acc: 0.5000
Epoch 13/50
400/400 [==============================] - 0s 165us/step - loss: 0.6746 - acc: 0.5000
Epoch 14/50
400/400 [==============================] - 0s 128us/step - loss: 0.6736 - acc: 0.5000
Epoch 15/50
400/400 [==============================] - 0s 125us/step - loss: 0.6728 - acc: 0.5000
Epoch 16/50
400/400 [==============================] - 0s 165us/step - loss: 0.6718 - acc: 0.5000
Epoch 17/50
400/400 [==============================] - 0s 161us/step - loss: 0.6710 - acc: 0.5000
Epoch 18/50
400/400 [==============================] - 0s 170us/step - loss: 0.6702 - acc: 0.5000
Epoch 19/50
400/400 [==============================] - 0s 122us/step - loss: 0.6694 - acc: 0.5000
Epoch 20/50
400/400 [==============================] - 0s 110us/step - loss: 0.6686 - acc: 0.5000
Epoch 21/50
400/400 [==============================] - 0s 142us/step - loss: 0.6676 - acc: 0.5000
Epoch 22/50
400/400 [==============================] - 0s 142us/step - loss: 0.6667 - acc: 0.5000
Epoch 23/50
400/400 [==============================] - 0s 149us/step - loss: 0.6659 - acc: 0.5000
Epoch 24/50
400/400 [==============================] - 0s 125us/step - loss: 0.6651 - acc: 0.5000
Epoch 25/50
400/400 [==============================] - 0s 134us/step - loss: 0.6643 - acc: 0.5000
Epoch 26/50
400/400 [==============================] - 0s 143us/step - loss: 0.6634 - acc: 0.5000
Epoch 27/50
400/400 [==============================] - 0s 137us/step - loss: 0.6625 - acc: 0.5000
Epoch 28/50
400/400 [==============================] - 0s 131us/step - loss: 0.6616 - acc: 0.5025
Epoch 29/50
400/400 [==============================] - 0s 119us/step - loss: 0.6608 - acc: 0.5100
Epoch 30/50
400/400 [==============================] - 0s 143us/step - loss: 0.6601 - acc: 0.5025
Epoch 31/50
400/400 [==============================] - 0s 148us/step - loss: 0.6593 - acc: 0.5350
Epoch 32/50
400/400 [==============================] - 0s 161us/step - loss: 0.6584 - acc: 0.5325
Epoch 33/50
400/400 [==============================] - 0s 152us/step - loss: 0.6576 - acc: 0.5700
Epoch 34/50
400/400 [==============================] - 0s 128us/step - loss: 0.6568 - acc: 0.5850
Epoch 35/50
400/400 [==============================] - 0s 155us/step - loss: 0.6560 - acc: 0.5975
Epoch 36/50
400/400 [==============================] - 0s 136us/step - loss: 0.6552 - acc: 0.6425
Epoch 37/50
400/400 [==============================] - 0s 140us/step - loss: 0.6544 - acc: 0.6150
Epoch 38/50
400/400 [==============================] - 0s 120us/step - loss: 0.6538 - acc: 0.6375
Epoch 39/50
400/400 [==============================] - 0s 140us/step - loss: 0.6531 - acc: 0.6725
Epoch 40/50
400/400 [==============================] - 0s 135us/step - loss: 0.6523 - acc: 0.6750
Epoch 41/50
400/400 [==============================] - 0s 136us/step - loss: 0.6515 - acc: 0.7300
Epoch 42/50
400/400 [==============================] - 0s 126us/step - loss: 0.6505 - acc: 0.7450
Epoch 43/50
400/400 [==============================] - 0s 141us/step - loss: 0.6496 - acc: 0.7425
Epoch 44/50
400/400 [==============================] - 0s 162us/step - loss: 0.6489 - acc: 0.7675
Epoch 45/50
400/400 [==============================] - 0s 161us/step - loss: 0.6480 - acc: 0.7775
Epoch 46/50
400/400 [==============================] - 0s 126us/step - loss: 0.6473 - acc: 0.7575
Epoch 47/50
400/400 [==============================] - 0s 124us/step - loss: 0.6464 - acc: 0.7625
Epoch 48/50
400/400 [==============================] - 0s 130us/step - loss: 0.6455 - acc: 0.7950
Epoch 49/50
400/400 [==============================] - 0s 191us/step - loss: 0.6445 - acc: 0.8100
Epoch 50/50
400/400 [==============================] - 0s 163us/step - loss: 0.6435 - acc: 0.8625
The accuracy starts to increase (Note that if you train this model multiple times, each time it may take different number of epochs to reach an acceptable accuracy, anything from 10 to 100 epochs).
Also, in my experiments I noticed that increasing the number of units in the first dense layer, for example to 5 or 10 units, causes the model to be trained faster (i.e. quickly converge).
Why so many epochs needed?
I think it is because of these two reasons (combined):
1) Despite the fact that the two classes are easily separable, your data is made up of random samples, and
2) The number of data points compared to the size of neural net (i.e. number of trainable parameters, which is 9 in example code above) is relatively large.
Therefore, it takes more epochs for the model to learn the weights. It is as though the model is very restricted and needs more and more experience to correctly find the appropriate weights. As an evidence, just try to increase the number of units in the first dense layer. You are almost guaranteed to reach an accuracy of +90% with less than 10 epochs each time you attempt to train this model. Here you increase the capacity and therefore the model converges (i.e. trains) much faster (it should be noted that it starts to overfit if the capacity is too high or you train the model for too many epochs. You should have a validation scheme to monitor this issue).
Side note:
Don't set the high argument to a number less than the low argument in numpy.random.uniform since, according to the documentation, the results will be "officially undefined" in this case.
Update:
One more important thing here (maybe the most important thing in this scenario) is the learning rate of the optimizer. If the learning rate is too low, the model converges slowly. Try increasing the learning rate, and you can see you reach an accuracy of 100% with less than 5 epochs:
from keras import optimizers
model.compile(loss='binary_crossentropy',
optimizer=optimizers.RMSprop(lr=1e-1),
metrics=['accuracy'])
# or you may use adam
model.compile(loss='binary_crossentropy',
optimizer=optimizers.Adam(lr=1e-1),
metrics=['accuracy'])
The issue is that your labels are 1 and 2 instead of 0 and 1. Keras will not raise an error when it sees 2, but it is not capable of predicting 2.
Subtract 1 from all your y values. As a side note, it is common in deep learning to use 1 neuron with sigmoid for binary classification (0 or 1) vs 2 classes with softmax. Finally, use binary_crossentropy for the loss for binary classification problems.