I am trying to create a regression model but my validation accuracy stays at .5073. I am trying to train on images and have the network find the position of an object and the rough area it covers. I increased the unfrozen layers and the plateau for accuracy dropped to .4927. I would appreciate any help finding out what I am doing wrong.
base = MobileNet(weights='imagenet', include_top=False, input_shape=(200,200,3), dropout=.3)
location = base.output
location = GlobalAveragePooling2D()(location)
location = Dense(16, activation='relu', name="locdense1")(location)
location = Dense(32, activation='relu', name="locdense2")(location)
location = Dense(64, activation='relu', name="locdense3")(location)
finallocation = Dense(3, activation='sigmoid', name="finalLocation")(location)
model = Model(inputs=base_model.input,outputs=finallocation)#[types, finallocation])
for layer in model.layers[:91]: #freeze up to 87
if ('loc' or 'Loc') in layer.name:
layer.trainable=True
else: layer.trainable=False
optimizer = Adam(learning_rate=.001)
model.compile(optimizer=optimizer, loss='mean_squared_error', metrics=['accuracy'])
history = model.fit(get_batches(type='Train'), validation_data=get_batches(type='Validation'), validation_steps=500, steps_per_epoch=1000, epochs=10)
Data is generated from a tfrecord file which has image data and some labels. This is the last bit of that generator.
IMG_SIZE = 200
def format_position(image, positionx, positiony, width):
image = tf.cast(image, tf.float32)
image = (image/127.5) - 1
image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
labels = tf.stack([positionx, positiony, width])
return image, labels
Get batches:
dataset is loaded from two directories with tfrecord files, one for training, and other for validation
def get_batches(type):
dataset = load_dataset(type=type)
if type == 'Train':
databatch = dataset.repeat()
databatch = dataset.batch(32)
databatch = databatch.prefetch(2)
return databatch
```positionx positiony width``` are all normalized from 0-1 (relative position with respect to the image.
Here is an example output:
Epoch 1/10
1000/1000 [==============================] - 233s 233ms/step - loss: 0.0267 - accuracy: 0.5833 - val_loss: 0.0330 - val_accuracy: 0.5073
Epoch 2/10
1000/1000 [==============================] - 283s 283ms/step - loss: 0.0248 - accuracy: 0.6168 - val_loss: 0.0337 - val_accuracy: 0.5073
Epoch 3/10
1000/1000 [==============================] - 221s 221ms/step - loss: 0.0238 - accuracy: 0.6309 - val_loss: 0.0312 - val_accuracy: 0.5073
The final activation function in your model should not be sigmoid since it will output numbers between 0 and 1 and I am assuming your labels (i.e., positionx, positiony, and width are not in this range). You could replace it with either 'linear' or 'relu'.
You're doing regression, and your loss function is 'mean_squared_error'. You cannot use accuracy as the metric function. You should use 'mae' (mean absolute error) or 'mse' to check the difference between your predictions and actual target values.
Related
I have been given 10000 images of shape (100,100), representing detection of particles, I have then created 10000 empty images of shape (100,100) and mixed them together. I have given each respective type labels of 0 and 1, seen in the code here:
Labels = np.append(np.ones(10000),np.zeros(empty_sheets.shape[0]))
images_scale1 = np.zeros(s) #scaling each image so that it has a maximum number of 1
#scaling each image so that it has a maximum number of 1
l = s[0]
for i in range(l):
images_scale1[i] = images[i]/np.amax(images[i])
empty_sheets_noise1 = add_noise(empty_sheets,0)
scale1noise1 = np.concatenate((images_scale1,empty_sheets_noise1),axis=0)
y11 = Labels
scale1noise1s, y11s = shuffle(scale1noise1, y11)
scale1noise1s_train, scale1noise1s_test, y11s_train, y11s_test = train_test_split(
scale1noise1s, y11, test_size=0.25)
#reshaping image arrays so that they can be passed through CNN
scale1noise1s_train = scale1noise1s_train.reshape(scale1noise1s_train.shape[0],100,100,1)
scale1noise1s_test = scale1noise1s_test.reshape(scale1noise1s_test.shape[0],100,100,1)
y11s_train = y11s_train.reshape(y11s_train.shape[0],1)
y11s_test = y11s_test.reshape(y11s_test.shape[0],1)
Then to set up my model I create a new function:
def create_model():
#initiates new model
model = keras.models.Sequential()
model.add(keras.layers.Conv2D(64, (3,3),activation='relu',input_shape=(100,100,1)))
model.add(keras.layers.MaxPooling2D((2, 2)))
model.add(keras.layers.Dropout(0.2))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(32))
model.add(keras.layers.Dense(64))
model.add(keras.layers.Dense(1,activation='sigmoid'))
return model
estimators1m1 = create_model()
estimators1m1.compile(optimizer='adam', metrics=['accuracy', tf.keras.metrics.Precision(),
tf.keras.metrics.Recall()], loss='binary_crossentropy')
history = estimators1m1.fit(scale1noise1s_train, y11s_train, epochs=3,
validation_data=(scale1noise1s_test, y11s_test))
which produces the following:
Epoch 1/3 469/469 [==============================] - 62s 131ms/step -
loss: 0.6939 - accuracy: 0.4917 - precision_2: 0.4905 - recall_2:
0.4456 - val_loss: 0.6933 - val_accuracy: 0.5012 - val_precision_2: 0.5012 - val_recall_2: 1.0000 Epoch 2/3 469/469 [==============================] - 63s 134ms/step - loss: 0.6889 -
accuracy: 0.5227 - precision_2: 0.5209 - recall_2: 0.5564 - val_loss:
0.6976 - val_accuracy: 0.4994 - val_precision_2: 0.5014 - val_recall_2: 0.2191 Epoch 3/3 469/469
[==============================] - 59s 127ms/step - loss: 0.6527 -
accuracy: 0.5783 - precision_2: 0.5764 - recall_2: 0.5887 - val_loss:
0.7298 - val_accuracy: 0.5000 - val_precision_2: 0.5028 - val_recall_2: 0.2131
I have tried more epochs and I still only manage to get 50% accuracy which is useless as its just predicting the same things constantly.
There can be many reasons why your model is not working. One that seems more likely is that the model is under-fitting as both accuracy on training set and validation set is low meaning that the neural network is unable to capture the pattern in the data. Hence you should consider building little more complex model by adding more layer at the same time avoiding over-fitting with techniques like dropout. You should also get best parameters by doing hyperparameter tuning.
I'm trying to train a simple model for the Yelp binary classification task.
Load BERT encoder:
gs_folder_bert = "gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-12_H-768_A-12"
bert_config_file = os.path.join(gs_folder_bert, "bert_config.json")
config_dict = json.loads(tf.io.gfile.GFile(bert_config_file).read())
bert_config = bert.configs.BertConfig.from_dict(config_dict)
_, bert_encoder = bert.bert_models.classifier_model(
bert_config, num_labels=2)
checkpoint = tf.train.Checkpoint(model=bert_encoder)
checkpoint.restore(
os.path.join(gs_folder_bert, 'bert_model.ckpt')).assert_consumed()
Load data:
data, info = tfds.load('yelp_polarity_reviews', with_info=True, batch_size=-1, as_supervised=True)
train_x_orig, train_y_orig = tfds.as_numpy(data['train'])
train_x = encode_examples(train_x_orig)
train_y = train_y_orig
Use BERT to embed the data:
encoder_output = bert_encoder.predict(train_x)
Setup the model:
inputs = keras.Input(shape=(768,))
x = keras.layers.Dense(64, activation='relu')(inputs)
x = keras.layers.Dense(8, activation='relu')(x)
outputs = keras.layers.Dense(1, activation='sigmoid')(x)
model = keras.Model(inputs=inputs, outputs=outputs)
sgd = SGD(lr=0.0001)
model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])
Train:
model.fit(encoder_output[0], train_y, batch_size=64, epochs=3)
# encoder_output[0].shape === (10000, 1, 768)
# y_train.shape === (100000,)
Training results:
Epoch 1/5
157/157 [==============================] - 1s 5ms/step - loss: 0.6921 - accuracy: 0.5455
Epoch 2/5
157/157 [==============================] - 1s 5ms/step - loss: 0.6918 - accuracy: 0.5455
Epoch 3/5
157/157 [==============================] - 1s 5ms/step - loss: 0.6915 - accuracy: 0.5412
Epoch 4/5
157/157 [==============================] - 1s 5ms/step - loss: 0.6913 - accuracy: 0.5407
Epoch 5/5
157/157 [==============================] - 1s 5ms/step - loss: 0.6911 - accuracy: 0.5358
I tried different learning rates, but the main issue seems that training takes 1 second and the accuracy stays at ~0.5. Am I not setting the inputs/model correctly?
Your BERT model is not training. It has to be placed before dense layers and train as part of the model. the input layer has to take not BERT vectors, but the sequence of tokens cropped to max_length and padded. Here is the example code: https://keras.io/examples/nlp/text_extraction_with_bert/, see the beginning of create_model function.
Alternatively, you can use Trainer from transformers.
Training and validation is healthy for 2 epochs but after 2-3 epochs the Val_loss keeps increasing while the Val_acc keeps increasing.
I'm trying to train a CNN model to classify a given review to a single class of 1-5. Therefore, I considered it as a multi-class classification.
I've divided the dataset to 3 sets - 70% training, 20% testing and 10% validation.
Distribution of training data for 5 classes as follows.
1 - 31613, 2 - 32527, 3 - 61044, 4 - 140005, 5 - 173023.
Therefore I've added class weights as follows.
{1: 5.47, 2: 5.32, 3: 2.83, 4: 1.26, 5: 1}
Model structure is as below.
input_layer = Input(shape=(max_length, ), dtype='int32')
embedding = Embedding(vocab_size, 200, input_length=max_length)(input_layer)
channel1 = Conv1D(filters=100, kernel_size=2, padding='valid', activation='relu', strides=1)(embedding)
channel1 = GlobalMaxPooling1D()(channel1)
channel2 = Conv1D(filters=100, kernel_size=3, padding='valid', activation='relu', strides=1)(embedding)
channel2 = GlobalMaxPooling1D()(channel2)
channel3 = Conv1D(filters=100, kernel_size=4, padding='valid', activation='relu', strides=1)(embedding)
channel3 = GlobalMaxPooling1D()(channel3)
merged = concatenate([channel1, channel2, channel3], axis=1)
merged = Dense(256, activation='relu')(merged)
merged = Dropout(0.6)(merged)
merged = Dense(5)(merged)
output = Activation('softmax')(merged)
model = Model(inputs=[input_layer], outputs=[output])
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['categorical_accuracy'])
model.fit(final_X_train, final_Y_train, epochs=5, batch_size=512, validation_data=(final_X_val, final_Y_val), callbacks=callback, class_weight=class_weights)
1/5 - loss: 1.8733 - categorical_accuracy: 0.5892 - val_loss: 0.7749 - val_categorical_accuracy: 0.6558
2/5 - loss: 1.3908 - categorical_accuracy: 0.6917 - val_loss: 0.7421 - val_categorical_accuracy: 0.6784
3/5 - loss: 0.9587 - categorical_accuracy: 0.7734 - val_loss: 0.7595 - val_categorical_accuracy: 0.6947
4/5 - loss: 0.6402 - categorical_accuracy: 0.8370 - val_loss: 0.7921 - val_categorical_accuracy: 0.7216
5/5 - loss: 0.4520 - categorical_accuracy: 0.8814 - val_loss: 0.8556 - val_categorical_accuracy: 0.7331
Final accuracy = 0.7328754744261703
This seems to be an overfitting behavior, but I've tried adding dropout layers which didn't help. I've also tried increasing the data, which made the results even worst.
I'm totally new to deep learning, if anyone has any suggestions to improve, please let me know.
val_loss keeps increasing while the Val_acc keeps increasing This is maybe because of the loss function...loss function is being calculated using actual predicted probabilities while accuracy is being calculated using one hot vectors.
Let's take your 4-class example. For one of the review true class is, say 1. The predicted probabilities by the system are [0.25, 0.30, 0.25, 0.2]. According to categorical_accuracy your output is correct i.e [0, 1, 0, 0] but since your probability mass is so distributed...categorical_crossentropy will give a high loss as well.
As for the overfitting problem. I am not really sure why introducing more data is causing problems.
Try increasing the strides.
Don't make the data more imbalanced by adding data to any particular class.
I am facing to a problem for which I have difficulties to understand why I have such behaviour.
I am trying to use a pre-trained resnet 50 (keras) model for a binary image classification, I also built a simple cnn. I have about 8k balanced RGB images of size 200x200 and I divided this set into three sub-sets (train 70%, validation 15%, test 15%).
I built a generator to feed data to my models based on keras.utils.Sequence.
The problem that I have is my models tends to learn on the training set but on validation set I have poor results on pre-trained resnet50 and on simple cnn.
I tried several things to solve this problem but Not improvement at all.
With and without Data augmentation on training set (rotation)
Images are normalised between [0,1]
With and without Regularizers
Variation of the learning rate
This is an example of results obtained:
Epoch 1/200
716/716 [==============================] - 320s 447ms/step - loss: 8.6096 - acc: 0.4728 - val_loss: 8.6140 - val_acc: 0.5335
Epoch 00001: val_loss improved from inf to 8.61396, saving model to ../models_saved/resnet_adam_best.h5
Epoch 2/200
716/716 [==============================] - 287s 401ms/step - loss: 8.1217 - acc: 0.5906 - val_loss: 10.9314 - val_acc: 0.4632
Epoch 00002: val_loss did not improve from 8.61396
Epoch 3/200
716/716 [==============================] - 249s 348ms/step - loss: 7.5357 - acc: 0.6695 - val_loss: 11.1432 - val_acc: 0.4657
Epoch 00003: val_loss did not improve from 8.61396
Epoch 4/200
716/716 [==============================] - 284s 397ms/step - loss: 7.5092 - acc: 0.6828 - val_loss: 10.0665 - val_acc: 0.5351
Epoch 00004: val_loss did not improve from 8.61396
Epoch 5/200
716/716 [==============================] - 261s 365ms/step - loss: 7.0679 - acc: 0.7102 - val_loss: 4.2205 - val_acc: 0.5351
Epoch 00005: val_loss improved from 8.61396 to 4.22050, saving model to ../models_saved/resnet_adam_best.h5
Epoch 6/200
716/716 [==============================] - 285s 398ms/step - loss: 6.9945 - acc: 0.7161 - val_loss: 10.2276 - val_acc: 0.5335
....
This is classes used to load data into my models.
class DataGenerator(keras.utils.Sequence):
def __init__(self, inputs,
labels, img_size,
input_shape,
batch_size, num_classes,
validation=False):
self.inputs = inputs
self.labels = labels
self.img_size = img_size
self.input_shape = input_shape
self.batch_size = batch_size
self.num_classes = num_classes
self.validation = validation
self.indexes = np.arange(len(self.inputs))
self.inc = 0
def __getitem__(self, index):
"""Generate one batch of data
Parameters
----------
index :the index from which batch will be taken
Returns
-------
out : a tuple that contains (inputs and labels associated)
"""
batch_inputs = np.zeros((self.batch_size, *self.input_shape))
batch_labels = np.zeros((self.batch_size, self.num_classes))
# Generate data
for i in range(self.batch_size):
# choose random index in features
if self.validation:
index = self.indexes[self.inc]
self.inc += 1
if self.inc == len(self.inputs):
self.inc = 0
else:
index = random.randint(0, len(self.inputs) - 1)
batch_inputs[i] = self.rgb_processing(self.inputs[index])
batch_labels[i] = to_categorical(self.labels[index], num_classes=self.num_classes)
return batch_inputs, batch_labels
def __len__(self):
"""Denotes the number of batches per epoch
Returns
-------
out : number of batches per epochs
"""
return int(np.floor(len(self.inputs) / self.batch_size))
def rgb_processing(self, path):
img = load_img(path)
rgb = img.get_rgb_array()
if not self.validation:
if random.choice([True, False]):
rgb = random_rotation(rgb)
return rgb/np.max(rgb)
class Models:
def __init__(self, input_shape, classes):
self.input_shape = input_shape
self.classes = classes
pass
def simpleCNN(self, optimizer):
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
activation='relu',
input_shape=self.input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(len(self.classes), activation='softmax'))
model.compile(loss=keras.losses.binary_crossentropy,
optimizer=optimizer,
metrics=['accuracy'])
return model
def resnet50(self, optimizer):
model = keras.applications.resnet50.ResNet50(include_top=False,
input_shape=self.input_shape,
weights='imagenet')
model.summary()
model.layers.pop()
model.summary()
for layer in model.layers:
layer.trainable = False
output = Flatten()(model.output)
#I also tried to add dropout layers here with batch normalization but it does not change results
output = Dense(len(self.classes), activation='softmax')(output)
finetuned_model = Model(inputs=model.input,
outputs=output)
finetuned_model.compile(optimizer=optimizer,
loss=keras.losses.binary_crossentropy,
metrics=['accuracy'])
return finetuned_model
This is how these functions are called:
train_batches = DataGenerator(inputs=train.X.values,
labels=train.y.values,
img_size=img_size,
input_shape=input_shape,
batch_size=batch_size,
num_classes=len(CLASSES))
validate_batches = DataGenerator(inputs=validate.X.values,
labels=validate.y.values,
img_size=img_size,
input_shape=input_shape,
batch_size=batch_size,
num_classes=len(CLASSES),
validation=True)
if model_name == "cnn":
model = models.simpleCNN(optimizer=Adam(lr=0.0001))
elif model_name == "resnet":
model = models.resnet50(optimizer=Adam(lr=0.0001))
early_stopping = EarlyStopping(patience=15)
checkpointer = ModelCheckpoint(output_name + '_best.h5', verbose=1, save_best_only=True)
history = model.fit_generator(train_batches, steps_per_epoch=num_train_steps, epochs=epochs,
callbacks=[early_stopping, checkpointer], validation_data=validate_batches,
validation_steps=num_valid_steps)
I finally found the principal element that causes this over-fitting. Since I use a pre-trained model. I was set layers as non-trainable. Thus I tried to put them as trainable and It seems that it solves the problem.
for layer in model.layers:
layer.trainable = False
My hypothesis is that my images are too far away from data used to train the model.
I also added some dropouts and batch normalization at the end of the resnet model.
I am training a VGG-like convnet (like in the example http://keras.io/examples/) with a set of images. I convert images to arrays and resize them using scipy:
mapper = [] # list of photo ids
data = np.empty((NB_FILES, 3, 100, 100)).astype('float32')
i = 0
for f in onlyfiles[:NB_FILES]:
img = load_img(mypath + f)
a = img_to_array(img)
a_resize = np.empty((3, 100, 100))
a_resize[0,:,:] = sp.misc.imresize(a[0,:,:], (100,100)) / 255.0 # - 0.5
a_resize[1,:,:] = sp.misc.imresize(a[1,:,:], (100,100)) / 255.0 # - 0.5
a_resize[2,:,:] = sp.misc.imresize(a[2,:,:], (100,100)) / 255.0 # - 0.5
photo_id = int(f.split('.')[0])
mapper.append(photo_id)
data[i, :, :, :] = a_resize; i += 1
In the last dense layer I have 2 neurons and I activate with softmax. Here are the last lines:
model.add(Dense(2))
model.add(Activation('softmax'))
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd)
model.fit(data, target_matrix, batch_size=32, nb_epoch=2, verbose=1, show_accuracy=True, validation_split=0.2)
I am not able to improve reduce the loss and every epoch has the same loss and the same precision as the one before. The loss actually goes up between 1st and 2nd epoch:
Train on 1600 samples, validate on 400 samples
Epoch 1/5
1600/1600 [==============================] - 23s - loss: 3.4371 - acc: 0.7744 - val_loss: 3.8280 - val_acc: 0.7625
Epoch 2/5
1600/1600 [==============================] - 23s - loss: 3.4855 - acc: 0.7837 - val_loss: 3.8280 - val_acc: 0.7625
Epoch 3/5
1600/1600 [==============================] - 23s - loss: 3.4855 - acc: 0.7837 - val_loss: 3.8280 - val_acc: 0.7625
Epoch 4/5
1600/1600 [==============================] - 23s - loss: 3.4855 - acc: 0.7837 - val_loss: 3.8280 - val_acc: 0.7625
Epoch 5/5
1600/1600 [==============================] - 23s - loss: 3.4855 - acc: 0.7837 - val_loss: 3.8280 - val_acc: 0.7625
What am I doing wrong?
From my experience, this often happens when the learning rate is too high.
The optimization will be unable to find a minima and just "turn around".
The ideal rate will depend on your data and on the architecture of your network.
(As a reference, I'm at the moment running a convnet with 8 layers, on a sample size similar to yours, and the same lack of convergence could be observed until I reduced the learning rate to 0.001)
My suggestions would be to reduce the learning rate, try data augmentation.
Data augmentation code:
print('Using real-time data augmentation.')
# this will do preprocessing and realtime data augmentation
datagen = ImageDataGenerator(
featurewise_center=False, # set input mean to 0 over the dataset
samplewise_center=False, # set each sample mean to 0
featurewise_std_normalization=False, # divide inputs by std of the dataset
samplewise_std_normalization=False, # divide each input by its std
zca_whitening=True, # apply ZCA whitening
rotation_range=90, # randomly rotate images in the range (degrees, 0 to 180)
width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
horizontal_flip=True, # randomly flip images
vertical_flip=False) # randomly flip images
# compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied)
datagen.fit(X_train)
# fit the model on the batches generated by datagen.flow()
model.fit_generator(datagen.flow(X_train, Y_train,
batch_size=batch_size),
samples_per_epoch=X_train.shape[0],
nb_epoch=nb_epoch)